Added a function profile_gen to generate profile formatted output from any given xml. Fixes #5. #12

tinkerbotfoo · 2016-10-28T00:36:12Z

This is fix for the issue #5

Note: This is a initial version. I also wrote a test function. The code works and the test passes.
Following the agile pattern, make it work and make it better. I have made it to work and pass tests.
I am sure there is lot of room for improvements or even a total redesign. Open to suggestions and advice.

λ python -m xmldataset
Parse using profile output :

{'title_and_author': [{'title': "XML Developer's Guide", 'author': 'Gambardella, Matthew'}, {'title': 'Midnight Rain', 'author': 'Ralls, Kim'}, {'title': 'Maeve Ascendant', 'author': 'Corets, Eva'}, {'title': "Oberon's Legacy", 'author': 'Corets, Eva'}, {'title': 'The Sundered Grail', 'author': 'Corets, Eva'}, {'title': 'Lover Birds', 'author': 'Randall, Cynthia'}, {'title': 'Splish Splash', 'author': 'Thurman, Paula'}, {'title': 'Creepy Crawlies', 'author': 'Knorr, Stefan'}, {'title': 'Paradox Lost', 'author': 'Kress, Peter'}, {'title': 'Microsoft .NET: The Programming Bible', 'author': "O'Brien, Tim"}, {'title': 'MSXML3: A Comprehensive Guide', 'author': "O'Brien, Tim"}, {'title': 'Visual Studio 7: A Comprehensive Guide', 'author': 'Galos, Mike'}]}

Profile Gen output :

catalog
        lowest
                specificbefore
                        specificvalue
                book
                        optionalexternal
                                externaldata
                        author
                        title
                        genre
                        price
                        publish_date
                        description
                specificafter
                        specificvalue

…iple levels into the output

…any given xml

tinkerbotfoo · 2016-10-28T00:45:43Z

To Do:

Current implementation is limited to the xml nodes. It doesn't expand the attributes of the nodes into the profile. It can be done in another update.
While using this my team also felt that we can extend the profile gen function and data profiling stats at each level. This will be useful to know which nodes are used quite often.

tinkerbotfoo · 2016-10-28T01:23:24Z

Looks like python 3 build is breaking at this line (list assignment/ in place modification).
path_list[p_idx]= str(re.sub('\[.+]','',item))

Its works fine in python 2.7. I tried to resolve the issue. I haven't worked much with python 3. Appreciate if you can help ....

tinkerbotfoo · 2016-10-28T23:28:53Z

Fixed the python 3 issue. Plz check and lemme know. I am working on the other 2 ToDo items

…n the profile output

…e output as child of the element

tinkerbotfoo · 2016-10-29T00:37:02Z

Implemented the first todo: feature to add include_attributes flag to include the attributes in the profile output as child of the element. Also added a new test for this feature.

"number", "id" would not show up if the include_attributes flag is set to false., since they are attributes of xml elements.

plz review/test and lemme know your thoughts

catalog
    lowest
        number
        specificbefore
            specificvalue
        book
            id
            optionalexternalstart
                externaldata
            author
            title
            genre
            price
            publish_date
            description
            optionalexternalend
                externaldata
        book2
            id
            author
            title
            genre
            price
            publish_date
            description
        specificafter
            specificvalue

spurin · 2016-10-30T13:30:03Z

I think this is great, it would be good if we can include an option on the profile_gen function for also creating the dataset stubs, i.e. title = dataset:default

Can you see if you can refactor to just use the xml.etree.ElementTree without having a separate requirement for lxml. I originally chose this module as I would like the module to work without the installation of other modules (especially where there may be compilation requirements to satisfy the dependency). When creating the Perl equivalent, this ended up causing me headaches in terms of usage.

I also already have the logic in the code for handling elements/attributes so that can be reused where applicable.

Really appreciate your efforts, thanks again. Will aim for this and your other changes for the next release.

tinkerbotfoo · 2016-10-31T17:20:46Z

Thanks very much for your feedback and being accomodating .. I see 3 asks in this ..

refactor to use just xml.etree without lxml - i concur with you. I used lxml because it was convenient for the initial version. I will work on refactor to just use xml.etree.
Sry i am missing something to understand the request - Can you plz elaborate dataset stubs, i.e. title = dataset:default ..
I was originally planning to refactor the class _XMLDataset ., i needed some more time to do debug sessions and fully understand your logic and integrate with it seamlessly ...

tinkerbotfoo · 2016-10-31T17:26:18Z

Also i would welcome any insights/suggestions/your thoughts in to class _XMLDataset refactoring .., would be useful to me and appreciated ... Also plz feel free to lemme know if anything is needed to make it easy for you to integrate this functionality. You have done great job of structuring and clean code.. i have tried to be consistent with your approach where ever i was able to.. but there is much to be desired ....

tinkerbotfoo added 5 commits October 25, 2016 13:27

Added another example to demonstrate getting external value from mult…

4c4a2ef

…iple levels into the output

Formatting fix

9f7dff0

added a description for the profile

c8d7170

added some more description for the example 12

4fa5b26

Added a function profile_gen to generate profile formatted file from …

51aef48

…any given xml

tinkerbotfoo added 2 commits October 27, 2016 17:53

added lxml requirement

3b4d1e0

fix for python 3

fe31829

tinkerbotfoo mentioned this pull request Oct 28, 2016

A tool for showing the tree structure would be nice. #5

Closed

python3 iisue fix - replaced the filter function with list slicer

d59d92a

tinkerbotfoo added 2 commits October 28, 2016 16:53

Initial prep for implementing the feature to include xml attributes i…

e27683b

…n the profile output

Added include attributes flag to include the attributes in the profil…

04e04de

…e output as child of the element

spurin closed this May 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a function profile_gen to generate profile formatted output from any given xml. Fixes #5. #12

Added a function profile_gen to generate profile formatted output from any given xml. Fixes #5. #12

tinkerbotfoo commented Oct 28, 2016 •

edited

tinkerbotfoo commented Oct 28, 2016

tinkerbotfoo commented Oct 28, 2016

tinkerbotfoo commented Oct 28, 2016

tinkerbotfoo commented Oct 29, 2016

spurin commented Oct 30, 2016

tinkerbotfoo commented Oct 31, 2016

tinkerbotfoo commented Oct 31, 2016

Added a function profile_gen to generate profile formatted output from any given xml. Fixes #5. #12

Added a function profile_gen to generate profile formatted output from any given xml. Fixes #5. #12

Conversation

tinkerbotfoo commented Oct 28, 2016 • edited

tinkerbotfoo commented Oct 28, 2016

tinkerbotfoo commented Oct 28, 2016

tinkerbotfoo commented Oct 28, 2016

tinkerbotfoo commented Oct 29, 2016

spurin commented Oct 30, 2016

tinkerbotfoo commented Oct 31, 2016

tinkerbotfoo commented Oct 31, 2016

tinkerbotfoo commented Oct 28, 2016 •

edited