# **Practical Example: Integrating Data from Multiple Sources for COVID-19 Cases in the US**

In this exercise, we will walk through a practical example of data integration using the following sources:
1.	**Open Government API:** We’ll use the **USA Facts COVID-19 data API** to fetch COVID-19 case numbers for different states.
2.	**CSV File:** A local CSV file will contain state population data, which will be merged with the COVID-19 case numbers to calculate the number of cases per capita.

We’ll use **Pandas** to handle the CSV file, Requests to interact with the API, and **PyMongo** to store the resulting dataset in **MongoDB**.


**Pre-requisites:**

* Basic knowledge of Python.
* MongoDB Atlas account (or a local MongoDB instance).
* Install the required Python libraries

## **Step 1: Extracting Data from Multiple Sources**

**Fetching Data from an API:**

APIs are a common source of real-time data. For this example, we’ll fetch COVID-19 case data from the **USA Facts COVID-19 API**.
Here’s how to make an API call using Python’s requests library:


In [None]:
import requests
import pandas as pd

# Define the API endpoint for COVID-19 data by state
api_url = "https://api.covidtracking.com/v1/states/current.json"

# Make a GET request to the API
response = requests.get(api_url)

# Check if the request was successful
if response.status_code == 200:
    # Convert the response to a Pandas DataFrame
    covid_data = pd.DataFrame(response.json())
    print(covid_data.head())
else:
    print(f"API request failed with status code {response.status_code}")


In this code:

•	We use *requests.get()* to fetch data from the API.

•	If the API call is successful *(status code 200)*, the JSON response is converted to a Pandas DataFrame for further manipulation.

The resulting DataFrame contains COVID-19 case numbers by state, including fields such as state, positive (number of positive cases), death (number of deaths), and more.


**Reading Data from a CSV File**

Now, let’s read the population data from a CSV file. This data will later be merged with the COVID-19 case data to calculate cases per capita. Note that you have to add the CSV file "state_population.csv" to your google colab environment before executing the following code.

Here’s how to read a CSV file using **Pandas**:


In [None]:
# Load the state population data from a CSV file
population_data = pd.read_csv('state_population.csv')

# Display the first few rows of the DataFrame
print(population_data.head())


## **Step 2: Data Transformation**

Once we’ve extracted the data from both sources (API and CSV), we need to transform it. The key transformation here is merging the datasets and calculating cases per capita.


**Merging Data from Different Sources**

To merge the COVID-19 case data with the population data, we use the common key state. We’ll use Pandas' merge() function for this.


In [None]:
# Merge the COVID-19 case data with the population data
merged_data = pd.merge(covid_data, population_data, left_on='state', right_on='state')

# Display the merged data
print(merged_data.head())


This will create a DataFrame that contains both COVID-19 case data and population for each state.

**Calculating Cases Per Capita**

Now that the data is merged, we can perform a transformation to calculate the number of cases per capita (cases per 100,000 people).


In [None]:
# Calculate cases per capita (cases per 100,000 people)
merged_data['cases_per_capita'] = (merged_data['positive'] / merged_data['population']) * 100000

# Display the updated DataFrame
print(merged_data[['state', 'positive', 'population', 'cases_per_capita']].head())

This transformation adds a new column cases_per_capita that gives a normalized view of the number of COVID-19 cases relative to the state’s population.


**Loading the Integrated Data into MongoDB**

Finally, load the integrated dataset into a MongoDB collection for storage.

Ensure you have MongoDB running locally or use MongoDB Atlas. Connect to MongoDB using PyMongo.


In [None]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient('mongodb://localhost:27017/')
db = client['MongoDB_db']

# Insert the integrated data into MongoDB
db.covid_data.insert_many(merged_data.to_dict('records'))

print("Data loaded into MongoDB successfully!")


# **Congratulations on completing this practical exercise!**