# How to retrieve the data into a dataframe from MongoDB database

1. Create a file named `.env` in root directory of project
2. Put environmental variable `MONGO_URI="<actual_uri>"`
3. Paste following code and run it. If in EDA folder, this will run properly. If you're in another directory, you might have to adjust the absolute path. On success it should print, "Pinged your deployment. You successfully connected to MongoDB!".

In [4]:
import sys
import os
sys.path.append(os.path.abspath('../ETL_pipeline'))
from get_data_from_cloud import get_dataframe_from_cloud
from dotenv import load_dotenv
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi

load_dotenv()

# Get environment variables
MONGO_URI = os.getenv('MONGO_URI')
# Create a new client and connect to the server
client = MongoClient(MONGO_URI, server_api=ServerApi('1'))

try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Pinged your deployment. You successfully connected to MongoDB!


4. The dataset is split into 4 datasets and a codebook. Set the `collection_name` variable to get the dataframe you want. The following are the datasets' collection names
- "emissions" : Gets all data
- "country_data" : gets data of country/sector/territories 
- "continent_data" : gets data of continents
- "nations_data" : gets nations data (e.g. EU)
- "socioeconomic_data" : gets data of socioeconomic classes (like income)

    Codebook data:
- "codebook" : descriptions of all attributes

5. Run the following code replacing `collection_name` with the dataset collection name you want

In [None]:
collection = "<collection_name>"
df = get_dataframe_from_cloud(client=client, db_name="data", collection_name=collection)

The following gets all the collections and prints their head:

In [5]:
VALID_OPTIONS = ['emissions', 'country_data', 'continent_data', 'socioeconomic_data', 'codebook']

for collection in VALID_OPTIONS:
    df = get_dataframe_from_cloud(client=client, db_name="data", collection_name=collection)
    print(df.head())

Pinged your deployment. You successfully connected to MongoDB!
       country  year iso_code  population           gdp  cement_co2  \
0  Afghanistan  1950      AFG   7776182.0  9.421400e+09         0.0   
1  Afghanistan  1951      AFG   7879343.0  9.692280e+09         0.0   
2  Afghanistan  1952      AFG   7987783.0  1.001733e+10         0.0   
3  Afghanistan  1953      AFG   8096703.0  1.063052e+10         0.0   
4  Afghanistan  1954      AFG   8207953.0  1.086636e+10         0.0   

   cement_co2_per_capita    co2  co2_growth_abs  co2_growth_prct  ...  \
0                    0.0  0.084           0.070          475.000  ...   
1                    0.0  0.092           0.007            8.696  ...   
2                    0.0  0.092           0.000            0.000  ...   
3                    0.0  0.106           0.015           16.000  ...   
4                    0.0  0.106           0.000            0.000  ...   

   share_global_other_co2  share_of_temperature_change_from_ghg  \
0   