<a href="https://colab.research.google.com/github/marty916/AI-Training-Colab-Notebooks/blob/main/Fundamentals%20in%20Data%20Science/Data_Collection_Methods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Step 1 - Import the necessary components
[10 Minutes to Pandas](https://pandas.pydata.org/docs/user_guide/10min.html)



In [None]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

# Step 2
**Pandas** is a special tool in Python (a programming language) that helps you work with data easily. Imagine you have a big table of information, like a spreadsheet in Excel or Google Sheets. Pandas lets you read that table, change it, and analyze the data quickly.

Try others:
url = 'https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv'

In [None]:
# Reading a CSV file

url = 'https://people.sc.fsu.edu/~jburkardt/data/csv/hw_200.csv'
data = pd.read_csv(url)
data.head()

Unnamed: 0,Index,"Height(Inches)""","""Weight(Pounds)"""
0,1,65.78,112.99
1,2,71.52,136.49
2,3,69.4,153.03
3,4,68.22,142.34
4,5,67.79,144.3


# Step 3
**Requests** is a Python tool that lets you connect to websites and get information from them. It's like your web browser but for your Python programs. With Requests, you can ask a website for data and it will send it back to you.

In [None]:
# Fetching data from a public API
api_url = 'https://api.exchangerate-api.com/v4/latest/USD'
response = requests.get(api_url)
data = response.json()
data


{'provider': 'https://www.exchangerate-api.com',
 'terms': 'https://www.exchangerate-api.com/terms',
 'base': 'USD',
 'date': '2024-07-25',
 'time_last_updated': 1721865601,
 'rates': {'USD': 1,
  'AED': 3.67,
  'AFN': 70.69,
  'ALL': 92.52,
  'AMD': 387.97,
  'ANG': 1.79,
  'AOA': 890.03,
  'ARS': 928.67,
  'AUD': 1.52,
  'AWG': 1.79,
  'AZN': 1.7,
  'BAM': 1.8,
  'BBD': 2,
  'BDT': 117.46,
  'BGN': 1.8,
  'BHD': 0.376,
  'BIF': 2878.21,
  'BMD': 1,
  'BND': 1.34,
  'BOB': 6.92,
  'BRL': 5.61,
  'BSD': 1,
  'BTN': 83.74,
  'BWP': 13.58,
  'BYN': 3.26,
  'BZD': 2,
  'CAD': 1.38,
  'CDF': 2829.98,
  'CHF': 0.885,
  'CLP': 946.24,
  'CNY': 7.27,
  'COP': 4013.12,
  'CRC': 528.75,
  'CUP': 24,
  'CVE': 101.71,
  'CZK': 23.43,
  'DJF': 177.72,
  'DKK': 6.88,
  'DOP': 59.13,
  'DZD': 134.55,
  'EGP': 48.35,
  'ERN': 15,
  'ETB': 57.67,
  'EUR': 0.922,
  'FJD': 2.25,
  'FKP': 0.775,
  'FOK': 6.88,
  'GBP': 0.775,
  'GEL': 2.72,
  'GGP': 0.775,
  'GHS': 15.57,
  'GIP': 0.775,
  'GMD': 66.9,
 

In [None]:
# what's data look like?
print(type(data))
print(data)

<class 'dict'>


In [None]:
# print the keys
print(data.keys())



In [None]:
# print specific value
print(data['rates']['EUR'])

0.922


In [None]:
# print a spefic rate and time_last_updated
print(data['rates']['EUR'], data['time_last_updated'])

0.922 1721865601


In [None]:
import datetime

# print a specific rate and forma the data for time_last_updated
print(data['rates']['EUR'], datetime.datetime.fromtimestamp(data['time_last_updated']).strftime('%Y-%m-%d'))

0.922 2024-07-25


# Step 4
**BeautifulSoup** is a tool in Python used to read and understand the content of websites. It helps you take messy web pages and turn them into something neat and organized that you can work with.

In [None]:
# Web Scraping with BeautifulSoup
url = 'https://www.example.com'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
print(soup.title.text)
