## **Extracting Website Data with APIs and JSON: A Step-by-Step Guide**

# Step1: Get your API
Get your API (for free ones, just search for free API on google

for this tutorials, I will be using this:
http://universities.hipolabs.com/search?country=United+States)

The above API allow users to get a list of universities in a specified country.

# Step 2: Import Necessary Libraries

Before we can fetch and process the data from the API, we need to import a few libraries. These libraries will help us handle the HTTP requests, parse the JSON data, and work with the data in a tabular format.

Add the following code to your Python script:

**Explanation:**
import json: This library is used to parse JSON (JavaScript Object Notation) data, which is a common format for data exchange between a client and a server.

import urllib.parse: This module helps with parsing URLs and handling query strings.

import urllib.request: This module allows you to make HTTP requests to fetch data from the internet.

import pandas as pd: Pandas is a powerful data manipulation and analysis library in Python. We will use it to organize the data in a tabular format (dataframes).


In [1]:
import json
import urllib.parse
import urllib.request
import pandas as pd

# Step 3: Construct the API Request URL
Now, we need to construct the URL for our API request based on the user input. This will allow us to get the list of universities for the specified country.

Add the following code to your script:

Explanation:
location = input("Enter the name of the country whose universities you are interested in: "): This line prompts the user to input the name of the country. The entered value is stored in the variable location.

q = {"country": location}: We create a dictionary with the key "country" and the value as the user-provided location. This dictionary will be used to construct the query string for the API request.

url = "http://universities.hipolabs.com/search?" + urllib.parse.urlencode(q): This line constructs the full API request URL by appending the URL-encoded query string to the base URL. 

The urllib.parse.urlencode(q) function converts the dictionary q into a query string format (e.g., country=United+States).


In [2]:
location = input("enter name of country whose universities you are interested")
q = {"country" : location}
url = "http://universities.hipolabs.com/search?" + urllib.parse.urlencode(q)

enter name of country whose universities you are interested canada


# Step 4: Fetch Data from the API

With the API request URL ready, we can now fetch the data from the API. Add the following code to your script:

Explanation:

with urllib.request.urlopen(url) as response:: This line opens the URL and sends a request to the API. The urlopen function returns a response object, which we can read from. The with statement ensures that the response is properly closed after we're done with it.

page = json.loads(response.read().decode()):
response.read(): This reads the raw data from the response.
.decode(): This decodes the raw bytes into a string using UTF-8 encoding.

json.loads(): This parses the JSON string into a Python dictionary or list (depending on the JSON structure). The parsed JSON data is stored in the variable page.


In [3]:
with urllib.request.urlopen(url) as response:
    page = json.loads(response.read().decode())

# Step 5: Extract and Store Specific Information
Now that we have the data, we need to extract specific pieces of information (university name, state/province, and website) and store them in lists. Add the following code to your script:

Explanation:
Lists Initialization: We initialize three empty lists: school_name, province, and website to store the extracted information.

school_name = []: List to store university names.
province = []: List to store state or province names.
website = []: List to store university websites.
For Loop: We iterate over each item (university) in the page list.

for info in page:: Iterates through each university's information in the JSON data.
Extract Information:

name = info.get("name", "Nil"): Retrieves the university's name. If the name is not available, it defaults to "Nil".
location = info.get("state-province", "Nil"): Retrieves the state or province of the university. If not available, defaults to "Nil".
web = info.get("web_pages", "Nil")[0]: Retrieves the first website URL from the list of web pages. If not available, defaults to "Nil".
Append to Lists:

school_name.append(name): Adds the university name to the school_name list.
province.append(location): Adds the state or province to the province list.
website.append(web): Adds the website URL to the website list.


In [4]:
school_name = []
province = []
website = []

for info in page:
    name = info.get("name", "Nil")
    location = info.get("state-province", "Nil")
    web = info.get("web_pages", "Nil")[0]
    
    
    school_name.append(name)
    province.append(location)
    website.append(web)
    

# Step 6: Create a DataFrame
Finally, we will organize the extracted data into a pandas DataFrame. A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

Add the following code to your script:

Explanation:
pd.DataFrame(): This function creates a DataFrame from the provided data.
{"University": school_name, "State": province, "Uni Site": website}: We pass a dictionary where the keys are the column names ("University", "State", and "Uni Site") and the values are the lists containing the data for each column.
The resulting DataFrame will have three columns: "University", "State", and "Uni Site", populated with the respective data from the lists.

In [5]:
table = pd.DataFrame({"University": school_name,
                      "State" : province,
                      "Uni Site" : website})

# Step 7: Display the Data
To verify that our DataFrame has been created correctly, we will print the first 10 rows of the DataFrame. Add the following code to your script:
Explanation:
print(table.head(10)): The head() function in pandas returns the first n rows of the DataFrame. By passing 10 as an argument, we request the first 10 rows. The print() function then prints these rows to the console.
This step allows us to quickly inspect the DataFrame and ensure that the data has been correctly extracted and organized.

In [8]:
print(table.head(5))

              University             State                          Uni Site
0  Cégep de Saint-Jérôme            Quebec            https://www.cstj.qc.ca
1        Lambton College           Ontario     https://www.lambtoncollege.ca
2      Acadia University       Nova Scotia            http://www.acadiau.ca/
3      Algonquin College           Ontario  http://www.algonquincollege.com/
4         Ashton College  British Columbia     http://www.ashtoncollege.com/


# Step 8: Save the Data (You will have to run this line of code Yourself, just follow the instructions below)

Now that we have extracted and organized the data into a DataFrame, we can save it to a CSV file to see the full details or sharing with others. 

# To save the DataFrame to a CSV file, follow these steps:

Choose a Name: Decide on a name for your CSV file. This will be used to identify the file on your device.

Use the to_csv Method: Use the to_csv method available in pandas to save the DataFrame to a CSV file. 

# Here's how to do it:

#### *table.to_csv("universities_data.csv")*

Replace "universities_data.csv" with the desired name for your CSV file. Make sure to keep the .csv extension at the end of the filename.

Run the Code: Execute the above line of code in your Python environment. This will generate a CSV file with the specified name containing the data from the DataFrame.

Locate the File: Once the code is executed successfully, you can find the CSV file in the same directory where your Python script is located.

Now, you have successfully saved the extracted university data to a CSV file on your device.

In [7]:
#Run the code Here:




# Final Note 

Congratulations on completing this tutorial! You've learned how to extract data from a web API, process it using Python, and organize it into a tabular format. By following these steps, you've gained valuable skills that can be applied to a wide range of data extraction and analysis tasks.

Remember, the ability to work with APIs and manipulate data programmatically is a powerful skill in today's data-driven world. Whether you're a student, a researcher, or a professional in any field, these skills will serve you well in your endeavors.

Feel free to explore further, experiment with different APIs, and continue building your Python skills. Don't hesitate to reach out if you have any questions or need assistance.

*Keep coding, keep learning, and keep exploring new possibilities! Happy coding!*