## Day15 - Data Manipulation using Pandas (Part3 - Read JSON, HTML, Excel and pickle files)

- Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures.
- Self Learning Resource
    - Pandas in 10 Minutes: <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html"> Click Here </a>
    - Video1: https://www.youtube.com/watch?v=QUClKFFn1Vk
    - Video2: https://www.youtube.com/watch?v=tW1BWtQRZ2M
    - Video3: https://www.youtube.com/watch?v=xx4Vkc5RrWY
    


##### Note: 
1. First Clean the Evironment (Go to "Kernel" Menu --> "Restart & Clean Output"
2. To execute the code --> Click on a cell and press cntrl + enter key

# <span style='color:Red'>Working with JSON, HTML, Excel and pickle files</span>

##  <span style='color:Blue'>1. Read JSON and write CSV</span>

### 1.1 JSON data

In [None]:
Data = '{"EName": ["James","Max"],"Email": ["j@il.com","m@il.com"],"Profile": ["Team Lead", "Sr. Developer"]}'
Data

### 1.2 Json to data frame

In [None]:
import pandas as pd

# Json Data File
Data = '{"EName": ["James","Max"],"Email": ["j@il.com","m@il.com"],"Profile": ["Team Lead", "Sr. Developer"]}'

df = pd.read_json(Data)
df

#type(df)

### 1.3 Json to data frame to CSV

In [None]:
import pandas as pd

# Json Data File
Data = '{"EName": ["James","Max"],"Email": ["j@il.com","m@il.com"],"Profile": ["Team Lead", "Sr. Developer"]}'

df = pd.read_json(Data)

df.to_csv('15JData.csv')
# Open 15JData.csv

### 1.4 Read Wine Data from url and save to csv

In [None]:
# Part1: Read Data
import pandas as pd

#df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None)
df = pd.read_csv('wine15.data', header=None)

df.head()

##### Write to CSV

In [None]:
df.to_csv("wine15.csv", index = False)

### 1.5 Read Wine CSV 

In [None]:
import pandas as pd

df = pd.read_csv('wine15.csv')
df.head()

##  <span style='color:Blue'>2. Read Data from HTML</span>

### 2.1 Read HTML content (Bank List)

In [None]:
#url = 'https://www.fdic.gov/bank/individual/failed/banklist.html'
url = 'banklist15.html'

table = pd.read_html(url) # Read table from HTML file

#table

#table[0]

#type(table)

# Question: Write the table content to CSV file


### 2.2 Read HTML content (Mobile_country_code)

In [None]:
#url = 'https://en.wikipedia.org/wiki/Mobile_country_code'
url = 'mcc15.html'

table = pd.read_html(url, match='Country', header=0)

table 

#table[0]

#type(table)

# Question: Write the table content to CSV file


##  <span style='color:Blue'>3. Read excel file</span>

### 3.1 Read data from excel file

In [None]:
import pandas as pd

df = pd.read_excel('data15.xlsx')       # Default Sheet 0 will read
df.head()


### 3.2 Read excel sheets using name

In [None]:
import pandas as pd

df1 = pd.read_excel('data15.xlsx',sheet_name='file1')       # Read Sheet 0
df2 = pd.read_excel('data15.xlsx',sheet_name='file2')       # Read Sheet 1
df3 = pd.read_excel('data15.xlsx',sheet_name='file3')       # Read Sheet 2

print("Sheet.file1 --> ", df1.shape)
print("Sheet.file2 --> ", df2.shape)
print("Sheet.file3 --> ", df3.shape)


### 3.3 Read excel sheets using number

In [None]:
import pandas as pd

df1 = pd.read_excel('data15.xlsx',sheet_name=0)       # Read Sheet 0
df2 = pd.read_excel('data15.xlsx',sheet_name=1)       # Read Sheet 1
df3 = pd.read_excel('data15.xlsx',sheet_name=2)       # Read Sheet 2

print("Sheet.0 --> ", df1.shape)
print("Sheet.1 --> ", df2.shape)
print("Sheet.2 --> ", df3.shape)


##  <span style='color:Blue'>4. Pickling</span>
- To store the python object to disk.
- e.g.1  Storing Trained Machine Learning Model to file.
- e.g.2  Store Neural Network after every 100 iterations. Restart execution where it got failed, due to kernal restart or any other exception.
- Re-load the pickel file when needed.

### 4.1 Save data to pickel file

In [None]:
import pandas as pd

df = pd.read_excel('data15.xlsx',sheet_name=0)       # Read Sheet 0

df.to_pickle('pickel15')

### 4.2 Read data from pickel file

In [None]:
import pandas as pd

df = pd.read_pickle('pickel15')

df.head()