Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read XML file #389

Open
paillave opened this issue Nov 10, 2022 Discussed in #388 · 3 comments
Open

How to read XML file #389

paillave opened this issue Nov 10, 2022 Discussed in #388 · 3 comments
Assignees
Labels
documentation Documentation may be added or completed on the portal

Comments

@paillave
Copy link
Owner

Discussed in #388

Originally posted by LordBigDuck November 10, 2022
Hello, I have gone through the documentation and source code but didn't manage to write running code to read XML file. Could you provide some samples ?

@paillave paillave added the documentation Documentation may be added or completed on the portal label Nov 10, 2022
@paillave paillave self-assigned this Nov 10, 2022
@paillave
Copy link
Owner Author

paillave commented Nov 11, 2022

Xml reading system still has to be improved, but for now, here is how you read XML file:
FYI, whatever the size XML files (even gigabytes), the memory that is required to read it will never change; of course if you use operators that need to load the full dataset in memory (like a sort) you will have issues.
here is the setup from the command line:

dotnet new console -o TestXml
cd TestXml
dotnet add package Paillave.EtlNet.Core
dotnet add package Paillave.EtlNet.XmlFile

here is the content of Program.cs:

// See https://aka.ms/new-console-template for more information
using System.Text;
using Paillave.Etl.Core;
using Paillave.Etl.XmlFile;
using Paillave.Etl.XmlFile.Core;

var testXmlContent = @"<root>
    <elt1 v1=""qwe""><v2>asd</v2></elt1>
    <elt2 v3=""yxc""><v4>rtz</v4></elt2>
    <elt1 v1=""mnb""><v2>poi</v2></elt1>
</root>";

var res = await StreamProcessRunner.CreateAndExecuteAsync("dummy", DefineProcess);
Console.WriteLine(res.Failed ? $"fail: {res.ErrorTraceEvent}" : "Success");

void DefineProcess(ISingleStream<string> contextStream)
{
    var xmlNodes = contextStream
        .Select("create in memory file with content for test", _ => FileValue.Create(new MemoryStream(Encoding.UTF8.GetBytes(testXmlContent)), "example.xml", "testContent"))
        .CrossApplyXmlFile("parse xml", new MyXmlFileDefinition());
    xmlNodes.XmlNodeOfType<Elt1Node>("only Etl1").Do("write elt1", i => Console.WriteLine($"Node type 1 : {i.V1} - {i.V2}"));
    xmlNodes.XmlNodeOfType<Elt2Node>("only Etl2").Do("write elt2", i => Console.WriteLine($"Node type 2 : {i.V3} - {i.V4}"));
}

class MyXmlFileDefinition : XmlFileDefinition
{
    public MyXmlFileDefinition()
    {
        this.AddNodeDefinition(XmlNodeDefinition.Create("elt1", "/root/elt1", i => new Elt1Node
        {
            V1 = i.ToXPathQuery<string>("/root/elt1/@v1"),
            V2 = i.ToXPathQuery<string>("/root/elt1/v2"),
        }));
        this.AddNodeDefinition(XmlNodeDefinition.Create("elt2", "/root/elt2", i => new Elt1Node
        {
            V1 = i.ToXPathQuery<string>("/root/elt2/@v3"),
            V2 = i.ToXPathQuery<string>("/root/elt2/v4"),
        }));
    }
}
class Elt1Node
{
    public string V1 { get; set; }
    public string V2 { get; set; }
}
class Elt2Node
{
    public string V3 { get; set; }
    public string V4 { get; set; }
}

@felipepodolan
Copy link

felipepodolan commented Feb 24, 2023

I am also having some issues with the XML Reader. I am trying to make it work without the need to create any specific class. Is this possible? What I have tried:

`

class Program
    {
        static void Main(string[] args)
        {
            string fileName = @"C:\path_to_my_file\file.xml";
       
            var xmlFileDefinition = new XmlFileDefinition();

            xmlFileDefinition.AddNodeDefinition(
                XmlNodeDefinition.Create("V2", "/ns:root", i => i.ToXPathQuery<string>("/ns:root/ns:elt1/ns:v2") )
                );

            xmlFileDefinition.AddNodeDefinition(
                XmlNodeDefinition.Create("V1", "/ns:root", i => i.ToXPathQuery<string>("/ns:root/ns:elt1/@v1"))
                );

            xmlFileDefinition.AddNameSpace("ns", "some_namespace");

            XmlObjectReader reader = new XmlObjectReader(xmlFileDefinition);

            Stream stream = null;
            reader.Read(stream, fileName, new Action<XmlNodeParsed>(), new CancellationToken());

    }
}

`

@paillave
Copy link
Owner Author

paillave commented Feb 24, 2023

Hi Felipe. At the moment, you MUST return a concrete class as what you provide is not perceived as a factory but as a mapper (it is an expression, not a delegate). Moreover, this is the type of the returned element that will permit you to recognize the issued elements thanks to the operator XmlNodeOfType.
This is a sujet that I may work on a bit deeper as I will work on an implementation of a fast JSON parser as well, and I believe I will share algorithms. The way to setup the extract from this kind of tree structured files (xml, json, yaml...) may need to be changed compared to what I did so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation may be added or completed on the portal
Projects
None yet
Development

No branches or pull requests

2 participants