## Reading Files into Dataframes
 Let's dive into how to use Pandas to load CSV files and perform basic operations on the data.

We start by importing the Pandas library using the import pandas as pd statement.

To load a CSV file into a DataFrame, we use the pd.read_csv() function and pass the path to the CSV file as the argument. Replace 'path/to/your/file.csv' with the actual file path.

If you are using **colab** then

1. Click on the Files icon in the left sidebar to use the Using the Files explorer
2. Click on the Upload button. Select the lot file from your local machine and click Open
3. Copy the file path and place in pd.read_csv

After loading the data, we can display the first few rows of the DataFrame using the df.head() method. This provides a quick overview of the data structure and column names.

In [None]:
import pandas as pd

# Load a CSV file into a DataFrame

# set the path to your path and file name ie. 'path/to/your/file.csv'

csv_file = '/content/kaggle-house-price-data-set.csv'

df = pd.read_csv(csv_file)

# Display the first few rows of the DataFrame
print(df.head())


## WGET

A lot of data can be sourced online. Lets use a function to pull a data file directly from the web.

To retrieve the dataset lets use **wget** which is a It is a popular tool for downloading files and does not require any additional libraries to be loaded or installed. Recall we use this file in our data wrangling course so lets get the raw file from the github repository.

In [None]:
# WGET with HTTPS file path

!wget kaggle-house-price-data-set.csv https://raw.githubusercontent.com/odsc2015/Data-Wrangling-With-SQL/main/kaggle-house-price-data-set.csv

# Reanme the retrieved file using -O parameter
!wget -O second_house_price_set.csv  https://raw.githubusercontent.com/odsc2015/Data-Wrangling-With-SQL/main/kaggle-house-price-data-set.csv

# Load the dataset
house_df = pd.read_csv('second_house_price_set.csv')

house_df.columns


## Describe Our File


Recall We can use the df.describe() method to get summary statistics of the data, such as count, mean, standard deviation, minimum, and maximum values for numeric columns.

Accessing specific columns is straightforward.  

We can filter the data based on conditions using boolean indexing. In the example, df['Column_Name'] > 100 filters the DataFrame to include only rows where the values in the 'Column_Name' column are greater than 100. The filtered data is stored in the filtered_data variable.

Finally, Pandas provides a wide range of functionalities for data manipulation and analysis, allowing you to perform various operations like data transformations, aggregations, merging datasets, and more.

In [None]:

# Get summary statistics of the data
print(house_df.describe(include='all'))

# Access specific columns
print(house_df['SalePrice'])

# Filter data based on conditions
filtered_data = house_df[house_df['SalePrice'] > 500000]

## Web Scraping

Lets demonstrate the basic process of web scraping using the BeautifulSoup library.

We start by sending a GET request to the website of interest using the requests.get() function. In this example, we scrape data from 'https://www.example.com', but you can replace it with the URL of the website you want to scrape.

Next, we create a BeautifulSoup object by passing the response content (response.content) and the parser to use (in this case, 'html.parser'). The BeautifulSoup library parses the HTML content and provides methods for extracting specific data.

We use the find() method of the BeautifulSoup object to locate specific HTML elements. In this example, we extract the text within the first <h1> tag using soup.find('h1').text. We also extract all paragraphs (<p> tags) on the page using soup.find_all('p').

Finally, we print the extracted data to the console.

In [None]:
import requests
from bs4 import BeautifulSoup

# Send a GET request to the website
url = 'https://www.example.com'
response = requests.get(url)

# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')

# Extract data from specific elements
title = soup.find('h1').text
paragraphs = soup.find_all('p')

# Print the extracted data
print("Title:", title)
print("Paragraphs:")
for p in paragraphs:
    print(p.text)

### Quick Tip.

Most browser allow you to right click and inspect webpage elements. There you can find the HLTM tags you seek!

# Web Scraping
Lets try something more interesting. Lets scrape Apple's stock price from Google Finance using BeautifulSoup, you'll need to perform several steps:

1. Send an HTTP request to Google Finance's Apple stock page.
2. Parse the HTML content of the page using BeautifulSoup.
3. Locate and extract the relevant stock price element from the parsed HTML.

To find the correct element inspect the Apple stock price element at this URL https://www.google.com/finance/quote/AAPL:NASDAQ It should look somethig like this

![Stock Price](https://drive.google.com/uc?id=1mp_jWO2BfnZhsfc4Ob7dk83VyhFvGeCP )



In [None]:
import requests
from bs4 import BeautifulSoup

# Send an HTTP GET request to the Google Finance page for Apple's stock
url = "https://www.google.com/finance/quote/AAPL:NASDAQ"
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Parse the HTML content of the page
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find the element containing the stock price
    price_element = soup.find("div", class_="YMlKec fxKbKc")

    if price_element:
        # Extract and print the stock price
        stock_price = price_element.text.strip()
        print(f"Apple's stock price: {stock_price}")
    else:
        print("Stock price element not found on the page.")
else:
    print(f"Failed to retrieve the page. Status code: {response.status_code}")

## Results Review.

Heres's how we got the stock price

1. We import two libraries: requests for making HTTP requests to the website, and BeautifulSoup for parsing HTML content.
2. We define the URL of Apple's stock page on Google Finance and send an HTTP GET request to that URL using the requests.get() function. The response is stored in the response variable.
3. We check if the HTTP request was successful by verifying that the status code is 200, which indicates a successful response.
4. We create a BeautifulSoup object (soup) by parsing the HTML content of the page. The response.text contains the HTML content, and 'html.parser' is the parser used to parse the HTML.
5. We use BeautifulSoup's find() method to locate the HTML element that contains the stock price. In this case, we search for a <div> element with the class "YMlKec fxKbKc". Please note that this class selector may change over time, and it's essential to inspect the page's source code to find the correct selector.
5. If the price_element is found (i.e., not None), we extract the text within the element using price_element.text and remove any leading or trailing whitespace with strip(). Then, we print the stock price.
If the price_element is not found, we print a message indicating that the stock price element was not found on the page.


REMEMBER - You must respect the privacy and copyright requirements of any website you scrape data from!

## APIs

API stands for Application Programming Interface. It's a set of rules and protocols that allows different software applications to communicate with each other. APIs are used to request and exchange data between systems, making it possible for your application to interact with external services, retrieve data, and perform various tasks.

## APIs in Action
We can use an API to retrieve data from on onlive data provider using the  requests library in Python.

We start by sending a GET request to the API endpoint of interest using the requests.get() function. Replace 'https://api.example.com/data' with the actual API endpoint URL.

We check the status code of the response using response.status_code. A status code of 200 indicates a successful request.

If the request is successful, we extract the data from the response using response.json(). This converts the response content (usually in JSON format) into a Python object that we can work with.

Once we have the data, we can process and analyze it further based on our specific requirements.



In [None]:
import requests

# Send a GET request to the API
url = "https://jsonplaceholder.typicode.com/users/1"  # Random API that doesnt' require and API 'KEY'
response = requests.get(url)

# Check the status code
if response.status_code == 200:
    # Extract data from the response
    data = response.json()
    print(data)

    # Process and analyze the data
    # ...
else:
    print("Error:", response.status_code)

# Exercises

## Exercise 1

Using WGET retrieve the following file from github and examine it contents

https://raw.githubusercontent.com/odsc2015/Data-Wrangling-With-SQL/main/new_customers_attempt_a.csv

## Exercise 2

Using the requests and BeautifulSoup libraries get the stock price for another NASDAQ exchange stock such as NVDA which has the ticker for NVIDIA
