# XML Handling in Pandas for Data Science

This notebook is a part of my data science journey. It shows how to work with XML data using Python and Pandas — including real-world concepts like XPath, namespaces, and API integration.



## What is XML?

- **XML** stands for **eXtensible Markup Language**.
- XML is a **markup language** much like HTML.
- It is **self-descriptive** — stores data in a structured, hierarchical format.
- It was **designed to store and transport data**.
- XML is a **W3C recommendation** and is often used in **web APIs, config files, and data interchange**.



## Project Goals:

1. **Read XML data into a Pandas DataFrame**
2. **Convert DataFrame back to XML**
3. **Understand XML in real-world use**
4. Learn about:
   - XPath
   - Namespaces
   - XML in APIs


## Step 1: Read XML File and Display as DataFrame

In [12]:

import pandas as pd

# Read XML file into DataFrame
df = pd.read_xml('data/shapes.xml')
df


Unnamed: 0,shape,degrees,sides
0,square,360,4.0
1,circle,360,
2,triangle,180,3.0


## Step 2: Read XML from a String

In [13]:

xml = '''<?xml version="1.0" encoding="utf-8"?>
<data>
  <row>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides></sides>
  </row>
  <row>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>'''


## Step 3: Convert XML String to DataFrame

In [15]:

from io import StringIO

# Convert string to file-like object
df = pd.read_xml(StringIO(xml))
print(df)

      shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0


## Step 4: Convert DataFrame to XML and Save

In [16]:

df.to_xml('shapes1.xml')


## What is XPath?

- XPath is a **query language** to navigate through elements and attributes in an XML document.
- Think of it as **SQL for XML**.

Example usage:
```python
df = pd.read_xml('data/shapes.xml', xpath='.//row')
```


## Real Example with Nested XML + XPath

In [19]:

# Example nested XML (not executed)
# <shapes>
#   <shape>
#     <name>square</name>
#     <geometry>
#       <degrees>360</degrees>
#       <sides>4</sides>
#     </geometry>
#   </shape>
# </shapes>

# Use xpath to read nested elements
# pd.read_xml('nested_shapes.xml', xpath='.//shape')


## What are XML Namespaces?

In [18]:

# Example XML with namespace (not executed)
# <ns:shape xmlns:ns="http://example.com/schema">
#   <ns:name>square</ns:name>
# </ns:shape>

# Read with namespace
# pd.read_xml('namespaced.xml', xpath='.//ns:shape', namespaces={'ns': 'http://example.com/schema'})


## XML in APIs (Real-World Use)

In [20]:

import requests
from io import StringIO

# Sample (replace with real API)
# response = requests.get('https://example.com/data.xml')
# df = pd.read_xml(StringIO(response.text))


## Summary

| Task                         | Method                         |
|------------------------------|--------------------------------|
| Read XML file                | `pd.read_xml('file.xml')`      |
| Read XML from string         | `pd.read_xml(StringIO(xml))`   |
| Save DataFrame to XML        | `df.to_xml('output.xml')`      |
| Read nested XML              | Use `xpath` parameter          |
| Handle namespaces            | Use `namespaces` parameter     |



## Final Tips

- XML is everywhere: **APIs, config files, mobile data, and more**.
- **XPath** helps select exact elements.
- Combine with `requests` for **live XML data from APIs**.
