### <font color ='#2ECC71' >In this notebook:</font>

- Step 1: We will read data from XML files and API's


- Step 2: Convert it into DataFrame


- Step 3: Store the data in MongoDB database


- Step 4: Verify and read the data from MongoDB database

## Index

- [Parsing XML_File - (Function)](#parsing_xml_file) 
- [Creating DataFrame from the parsed XML file - (Function)](#xml_to_df)
- [Fetching Data from API - (Function)](#api_fetch)
- [Loading data into MongoDB - (Function)](#mongodb_load)


- [Dataset 1](#dataset1)
- [Loading Dataset 1 into MongoDB](#dataset1_to_mongodb)


- [Dataset 2](#dataset2)
- [Loading Dataset 2 into MongoDB](#dataset2_to_mongodb)


- [Reading the data from MongoDB - (Verifying)](#reading_from_mongodb)

## Importing Libraries

In [17]:
import pandas as pd
import numpy as np

import lxml.etree as ET

from bs4 import BeautifulSoup
import requests

from pymongo import MongoClient

<a id = "parsing_xml_file"></a>

## Parsing XML File

In [2]:
# function for parsing the XML file with exception handling

def read_xml(xml_file):
    try:
        tree = ET.parse(xml_file)
        root = tree.getroot()
        print('Succesfuly parsed the XML!')
        return root
        
    except FileNotFoundError:
        print('Error: Please check if the file exists')
        return None
    
    except ET.ParseError:
        print('Error: Failed to parse the XML file')
        return None
    
    except OSError:
        print('OSError: Error reading file: failed to load external entity')
        return None

<a id = "xml_to_df"></a>

## Creating DataFrame from the parsed XML file

In [3]:
# function to create the DataFrame from the parsed XML

def xml_to_dataframe(xml_file):
    
    root = read_xml(xml_file)

    data = []
    for row_element in root.findall('.//row'):
        row_data = {}
        for child_element in row_element:
            row_data[child_element.tag] = child_element.text
        data.append(row_data)
        
    df = pd.DataFrame(data)
    
    df = df.iloc[1:, 1:]
    
    print('\nData is stored in a DataFrame!\n')
    
    return df

<a id = "api_fetch"></a>

## Fetching Data from API

In [120]:
# function to fetch data through API and storing it in a dataframe

def api_data_fetch(api):
    
    api = requests.get(api)
    
    if str(api) == '<Response [200]>':
        print('Succesfully requested the API!', api, '\n')
        
        json_format = api.json() # converting the api data into json
        
        gas_name = next(iter(json_format)) # fetching gas name
        print(gas_name, 'is being fetched..\n')
        
        json_data = json_format[gas_name] # storing data in json
        
        df = pd.DataFrame(json_data) # storing json in dataframe
        print('Stored in DataFrame!', df.shape, '\n')
        
        return df
        
    else:
        print('Requested API not found.\n')
        
        

<a id = "mongodb_load"></a>

## Loading data into MongoDB

In [4]:
client = MongoClient("mongodb://localhost:27017")

In [5]:
# function to:
# 1. connect to MongoDB
# 2. create database & collection in MongoDB
# 3. insert records into MongoDB

def loading_to_mongodb(database_name, collection_name, dataframe):
    
    # creating a MongoDB client
    client = MongoClient("mongodb://localhost:27017")
    print(client)

    # creating a new database
    db = client[database_name]
    print('\nDatabase Created!\n', db)

    # creating a new collection or table to the store the data
    collection = db[collection_name]
    print('\nCollection Created!\n', collection)

    # storing the dataset into the MongoDB using insert_many()
    insert_result = db[collection_name].insert_many(dataframe.to_dict('records'))
    print('\nRecords Inserted!\n', insert_result)


<a id = "dataset1"></a>

### Dataset 1

In [7]:
# dataset 1 XML file path and name
xml_file1 = "Statewide_Greenhouse_Gas_Emissions__Beginning_1990.xml"

# calling the funciton to write the data into dataframe
df1 = xml_to_dataframe(xml_file1)

print('Number of rows and columns:', df1.shape)

df1.head()

Succesfuly parsed the XML!

Data is stored in a DataFrame!

Number of rows and columns: (13981, 13)


Unnamed: 0,gross,net,conventional_accounting,economic_sector,sector,category,sub_category_1,sub_category_2,sub_category_3,year,gas,mt_co2e_ar5_20_yr,mt_co2e_ar4_100_yr
1,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1990,CH4,4811,1432
2,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1990,CO2,521347,521347
3,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1990,N2O,2268,2560
4,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1991,CH4,5067,1508
5,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1991,CO2,550680,550680


<a id = "dataset1_to_mongodb"></a>

### Loading Dataset 1 into MongoDB

In [8]:
# database name
database_name = 'DAP_Project_DB'

# collection/table name
collection_name = 'greenhouse_gas_emission'

# dataset
dataframe = df1.copy()

# function to load the data to MongoDB
loading_to_mongodb(database_name, collection_name, df1)

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

Database Created!
 Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'DAP_Project_DB')

Collection Created!
 Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'DAP_Project_DB'), 'greenhouse_gas_emission')

Records Inserted!
 <pymongo.results.InsertManyResult object at 0x00000135ED527340>


<a id = "dataset2"></a>

### Dataset 2

In [117]:
# api's to fetch

co2_api = "https://global-warming.org/api/co2-api"

no2_api = "https://global-warming.org/api/nitrous-oxide-api"

methane_api = "https://global-warming.org/api/methane-api"

In [121]:
# calling api_data_fetch function

df_co2 = api_data_fetch(co2_api)

df_no2 = api_data_fetch(no2_api)

df_methane = api_data_fetch(methane_api)

Succesfully requested the API! <Response [200]> 

co2 is being fetched..

Stored in DataFrame! (3960, 5) 

Succesfully requested the API! <Response [200]> 

nitrous is being fetched..

Stored in DataFrame! (256, 5) 

Succesfully requested the API! <Response [200]> 

methane is being fetched..

Stored in DataFrame! (466, 5) 



<a id = "dataset2_to_mongodb"></a>

### Loading Dataset 2 into MongoDB

In [122]:
# database name
database_name = 'DAP_Project_DB'

# function to load the data to MongoDB
loading_to_mongodb(database_name, 'co2_tb', df_co2)

loading_to_mongodb(database_name, 'no2_tb', df_no2)

loading_to_mongodb(database_name, 'methane_tb', df_methane)


MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

Database Created!
 Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'DAP_Project_DB')

Collection Created!
 Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'DAP_Project_DB'), 'co2_tb')

Records Inserted!
 <pymongo.results.InsertManyResult object at 0x00000135AD21BFA0>
MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

Database Created!
 Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'DAP_Project_DB')

Collection Created!
 Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'DAP_Project_DB'), 'no2_tb')

Records Inserted!
 <pymongo.results.InsertManyResult object at 0x00000135AD106370>
MongoClient(host=['localhost:27017'], document_class=dict,

<a id = "reading_from_mongodb"></a>

## Reading the data from MongoDB

In [14]:
# checking whether data is loaded properly
# using the find method without a filter to read all records

results = client['DAP_Project_DB']['greenhouse_gas_emission'].find()
print(results)

df1_read = pd.DataFrame(results)
df1_read.head()

<pymongo.cursor.Cursor object at 0x00000135ECB1DEE0>


Unnamed: 0,_id,gross,net,conventional_accounting,economic_sector,sector,category,sub_category_1,sub_category_2,sub_category_3,year,gas,mt_co2e_ar5_20_yr,mt_co2e_ar4_100_yr
0,657768c6c9df27adb3203140,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1990,CH4,4811,1432
1,657768c6c9df27adb3203141,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1990,CO2,521347,521347
2,657768c6c9df27adb3203142,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1990,N2O,2268,2560
3,657768c6c9df27adb3203143,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1991,CH4,5067,1508
4,657768c6c9df27adb3203144,Yes,Yes,Yes,Buildings,Energy,Fuel Combustion,Commercial,Not Applicable,Coal,1991,CO2,550680,550680


In [123]:
results = client['DAP_Project_DB']['co2_tb'].find()
print(results)

df_read_co2 = pd.DataFrame(results)
df_read_co2.head()

<pymongo.cursor.Cursor object at 0x00000135AD221940>


Unnamed: 0,_id,year,month,day,cycle,trend
0,657867b6c9df27adb321f564,2013,2,6,396.08,394.59
1,657867b6c9df27adb321f565,2013,2,7,396.1,394.59
2,657867b6c9df27adb321f566,2013,2,8,396.12,394.6
3,657867b6c9df27adb321f567,2013,2,9,396.14,394.61
4,657867b6c9df27adb321f568,2013,2,10,396.16,394.62


In [124]:
results = client['DAP_Project_DB']['no2_tb'].find()
print(results)

df_read_no2 = pd.DataFrame(results)
df_read_no2.head()

<pymongo.cursor.Cursor object at 0x00000135AD3073D0>


Unnamed: 0,_id,date,average,trend,averageUnc,trendUnc
0,657867b6c9df27adb32204dd,2002.5,316.85,316.88,0.14,0.13
1,657867b6c9df27adb32204de,2002.6,316.83,316.92,0.14,0.13
2,657867b6c9df27adb32204df,2002.7,316.82,316.95,0.14,0.14
3,657867b6c9df27adb32204e0,2002.8,316.82,316.99,0.14,0.14
4,657867b6c9df27adb32204e1,2002.9,316.87,317.03,0.14,0.14


In [125]:
results = client['DAP_Project_DB']['methane_tb'].find()
print(results)

df_read_methane = pd.DataFrame(results)
df_read_methane.head()

<pymongo.cursor.Cursor object at 0x00000135AD1B2880>


Unnamed: 0,_id,date,average,trend,averageUnc,trendUnc
0,657867b6c9df27adb32205de,1984.11,1653.82,1649.98,0.96,0.58
1,657867b6c9df27adb32205df,1984.12,1656.19,1651.07,1.06,0.58
2,657867b6c9df27adb32205e0,1985.1,1655.58,1652.15,0.96,0.58
3,657867b6c9df27adb32205e1,1985.2,1652.25,1653.16,1.36,0.58
4,657867b6c9df27adb32205e2,1985.3,1654.61,1654.16,1.0,0.58


#### <font size='5' color='#1ABC9C'>We will retrieve data from MongoDB in Notebook 2 where EDA will be performed.</font>

In [26]:
client.close()