Capture Bullet points

python = 10.x.x
Windows OS
python-docx module = 0.8.11

I am trying to read the bulleted data from the word document using document.xml and numbering.xml
below is the code in which checking for w:ilvl exists or not and w:del for deleted content.
then not considering the w:numId value == 0 because we don't have a equivalent in the numbering.xml
then i am capturing the bulleted data and also the type of bullet format from the numbering.xml
```
document = Document("\\path_to_file.docx")
num_ele = document.part.numbering_part.element
doc_ele = document._part._element
word_paragraphs = doc_ele.xpath(".//w:p[boolean(.//w:pPr//w:numPr//w:ilvl)][not(boolean(.//w:del))]")
for word_paragraph in word_paragraphs:
    paragraph_properties = word_paragraph.xpath(".//w:pPr")
    number_properties = paragraph_properties[0].xpath(".//w:numPr")
    number_id = number_properties[0].xpath(".//w:numId//@w:val")
    if number_id[0] != '0':
        word_runs = word_paragraph.xpath(".//w:r")
        word_text = [word_run.xpath(".//w:t")[0].text for word_run in word_runs if len(word_run.xpath(".//w:t"))>0]
        word_text = ''.join(word_text)
        indentation_level = number_properties[0].xpath(".//w:ilvl//@w:val")
        abstract_num_id = num_ele.xpath(".//w:num[@w:numId="+number_id[0]+"]//w:abstractNumId//@w:val")[0]
        abstract_num = num_ele.xpath(".//w:abstractNum[@w:abstractNumId="+abstract_num_id+"]")[0]
        word_level = abstract_num.xpath(".//w:lvl[@w:ilvl="+indentation_level[0]+"]", namespaces={'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'})[0]
        number_format =  word_level.xpath(".//w:numFmt//@w:val", namespaces={'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'})[0]
```

After extracting the data and saved to database.
When it is required i am generating the document again back using the python-docx module and for bulleted points i have made an observation that python-docx is having a different format compared to the MS word actual format for the numbering.xml

the basic difference i have identified are as below

python-docx numbering.xml

![python-docx numbering.xml](https://user-images.githubusercontent.com/23114153/186830640-6a734131-27d3-40ea-9008-7b4bcfb4e4f7.png)

MS Word numbering.xml

![MS Word numbering.xml](https://user-images.githubusercontent.com/23114153/186830845-366bec0b-2d47-4a42-a73a-a8bfccd9bde9.png)

the difference in the indentation levels of the document are different, we can observe that in the following document.xml files

word generated document.xml

![word generated document.xml](https://user-images.githubusercontent.com/23114153/186831156-d4e088aa-47a6-4571-b537-cb998694f8bb.png)

python-docx generated document.xml

![python-docx generated document.xml](https://user-images.githubusercontent.com/23114153/186831360-1a77007d-9bee-4ce0-a355-688ed6d47ece.png)

we can observe that python-docx always maintains w:ilvl in the document.xml as '0' and updated the numbering.xml accordingly but the challenge is due to difference in the format i have following queries.

**Questions:**
could you please suggest what are the possible ways i can maintain similar implementation for both types.

Or give me an idea how can i make a difference b/w word edited and python-docx edited/modify document .. i tried custom tags but not worked because MS Word is removing automatically  



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Capture Bullet points #1134

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Capture Bullet points #1134

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions