<h1>Class 10: Introduction to Web Scraping and APIs</h1>

<h2>Understanding web scraping and its applications</h2>

<p><span style="font-size:16px;">It&#39;s important to note that while web scraping can be a powerful tool for data acquisition and analysis, it should be used responsibly and ethically, respecting the rights and policies of the websites being scraped.</span></p>

<p><span style="font-size:16px;">Web scraping is the process of extracting data from websites using automated methods. It involves fetching the HTML content of a web page, parsing and extracting the desired information, and then storing or utilizing that data for various purposes. Here&#39;s an explanation of web scraping and its applications:</span></p>

<ol>
	<li><span style="font-size:16px;">How Web Scraping Works: Web scraping typically follows these steps:</span></li>
</ol>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;">Sending an HTTP request to a website and fetching the HTML content of the desired web page.</span></li>
	<li><span style="font-size:16px;">Parsing the HTML to extract specific elements such as text, images, links, tables, or other structured data.</span></li>
	<li><span style="font-size:16px;">Transforming and cleaning the extracted data to make it usable.</span></li>
	<li><span style="font-size:16px;">Storing the scraped data or using it for analysis, visualization, research, or any other intended purpose.</span></li>
</ul>

<ol start="2">
	<li><span style="font-size:16px;">Applications of Web Scraping: Web scraping has a wide range of applications across various domains:</span></li>
</ol>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;"><strong>Data Collection</strong>: Web scraping allows you to gather large amounts of data from websites efficiently. This data can be used for market research, competitor analysis, sentiment analysis, pricing comparisons, or any other data-driven decision-making process.</span></li>
	<li><span style="font-size:16px;"><strong>Content Aggregation</strong>: Scraping content from different websites enables you to aggregate information from multiple sources and create comprehensive databases or informational resources.</span></li>
	<li><span style="font-size:16px;"><strong>Lead Generation</strong>: Web scraping can help in extracting contact information, email addresses, or other relevant details from websites for lead generation purposes.</span></li>
	<li><span style="font-size:16px;"><strong>Market Research</strong>: Scraping data from e-commerce websites or social media platforms can provide insights into market trends, customer behavior, or product reviews.</span></li>
	<li><span style="font-size:16px;"><strong>News Monitoring</strong>: Web scraping can be used to monitor news articles or blogs to gather real-time information, detect trends, or track mentions of specific topics or keywords.</span></li>
	<li><span style="font-size:16px;"><strong>Financial Data Analysis</strong>: Scraping financial data from websites allows for analysis of stock prices, market trends, financial reports, and other related data.</span></li>
	<li><span style="font-size:16px;"><strong>Job Posting Analysis</strong>: Scraping job portals can provide information on job postings, salaries, skills in demand, or companies hiring in specific industries.</span></li>
	<li><span style="font-size:16px;"><strong>Academic Research</strong>: Web scraping assists researchers in collecting data for academic studies, sentiment analysis, sentiment tracking, or monitoring public opinion on specific topics.</span></li>
</ul>

<ol start="3">
	<li><span style="font-size:16px;">Legal and Ethical Considerations: When engaging in web scraping, it is important to consider legal and ethical aspects:</span></li>
</ol>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;"><strong>Terms of Service</strong>: Ensure compliance with the website&#39;s terms of service or usage policy. Some websites explicitly prohibit or restrict web scraping.</span></li>
	<li><span style="font-size:16px;"><strong>Copyright and Intellectual Property</strong>: Respect copyright and intellectual property rights by avoiding scraping of copyrighted content or confidential data.</span></li>
	<li><span style="font-size:16px;"><strong>Respectful Crawling</strong>: Be mindful of the server load and avoid causing disruption to the target website&#39;s performance by implementing appropriate delays and throttling mechanisms.</span></li>
	<li><span style="font-size:16px;"><strong>Personal Data and Privacy</strong>: Be cautious when dealing with personal data and sensitive information. Ensure compliance with data protection regulations and respect user privacy.</span></li>
</ul>


<h2>Introduction to HTML parsing and data extraction using BeautifulSoup</h2>

<ol>
	<li><span style="font-size:16px;">Installation: To get started, you need to install BeautifulSoup. You can use <code>pip</code> to install it by running the following command in your terminal or command prompt:</span></li>
</ol>

<p><span style="font-size:16px;"><code>pip install beautifulsoup4 </code></span></p>

<p>&nbsp;</p>

<ol start="2">
	<li><span style="font-size:16px;">Importing BeautifulSoup: To use BeautifulSoup in your Python code, you need to import it. Typically, you import it like this:</span></li>
</ol>

<p><span style="font-size:16px;"><code>from bs4 import BeautifulSoup</code></span></p>

<p>&nbsp;</p>

<ol start="3">
	<li><span style="font-size:16px;">Parsing HTML: To parse an HTML document using BeautifulSoup, you need to create a <code>BeautifulSoup</code> object by providing the HTML content and a parser. BeautifulSoup supports different parsers, such as <code>&#39;html.parser&#39;</code>, <code>&#39;lxml&#39;</code>, or <code>&#39;html5lib&#39;</code>.</span></li>
</ol>


In [None]:
from bs4 import BeautifulSoup
html_content = '''
<html>
<head>
    <title>My Webpage</title>
</head>
<body>
    <h1>Welcome to my webpage</h1>
    <p>This is a paragraph of text.</p>
    <ul>
        <li>Item 1</li>
        <li>Item 2</li>
        <li>Item 3</li>
    </ul>
</body>
</html>
'''

<ol start="3">
	<li><span style="font-size:16px;">Navigating the Document Structure: Once you have the <code>BeautifulSoup</code> object, you can navigate the HTML document&#39;s structure using various methods and properties provided by BeautifulSoup. Some common ones include:</span></li>
</ol>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;"><code>soup.title</code>: Accesses the title tag.</span></li>
	<li><span style="font-size:16px;"><code>soup.body</code>: Accesses the body tag.</span></li>
	<li><span style="font-size:16px;"><code>soup.find(&#39;tag&#39;)</code>: Finds the first occurrence of the specified tag.</span></li>
	<li><span style="font-size:16px;"><code>soup.find_all(&#39;tag&#39;)</code>: Finds all occurrences of the specified tag.</span></li>
	<li><span style="font-size:16px;"><code>element.text</code>: Extracts the text content of an element.</span></li>
	<li><span style="font-size:16px;"><code>element[&#39;attribute&#39;]</code>: Accesses the value of a specific attribute of an element.</span></li>
</ul>

<ol start="5">
	<li><span style="font-size:16px;">Data Extraction: BeautifulSoup provides various methods to extract data from HTML elements based on different criteria like tag names, class names, attribute values, or CSS selectors. Some commonly used methods include:</span></li>
</ol>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;"><code>find()</code>: Finds the first occurrence of an element based on a specific criterion.</span></li>
	<li><span style="font-size:16px;"><code>find_all()</code>: Finds all occurrences of elements based on a specific criterion.</span></li>
	<li><span style="font-size:16px;"><code>select()</code>: Finds elements based on CSS selectors.</span></li>
</ul>

<h2>Making HTTP requests and interacting with APIs</h2>

<ol start="1">
	<li><span style="font-size:16px;">Working with the request library</span></li>
</ol>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;">Installing the Requests Library: Before you start, you need to install the <code>requests</code> library. You can use <code>pip</code> to install it by running the following command in your terminal or command prompt:&nbsp;</span></li>
</ul>

<p><span style="font-size:16px;"><code>pip install requests</code></span></p>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;">Importing the Requests Library: To use the <code>requests</code> library in your Python code, you need to import it:</span></li>
</ul>

<p><span style="font-size:16px;"><code>import requests</code></span></p>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;"><strong>GET Requests</strong>: To make a GET request to a URL and retrieve data from the server, you can use the <code>requests.get()</code> method.</span></li>
	<li><span style="font-size:16px;"><strong>POST Requests</strong>: To make a POST request and send data to the server, you can use the <code>requests.post()</code> method.&nbsp;</span></li>
</ul>

<h2>Parsing JSON responses and working with API data in Python</h2>

<ol>
	<li><span style="font-size:16px;">Handling JSON Responses: Many APIs return data in JSON format. You can use the <code>.json()</code> method of the response object to parse the JSON data into a Python dictionary.</span></li>
</ol>


<ol start="2">
	<li><span style="font-size:16px;">Handling Query Parameters: You can pass query parameters in the URL for GET requests using the <code>params</code> parameter of the <code>requests.get()</code> method.</span></li>
</ol>

<ol start="3">
	<li><span style="font-size:16px;">Handling Authentication: Some APIs require authentication to access their data. You can provide authentication details using the <code>auth</code> parameter of the <code>requests.get()</code> or <code>requests.post()</code> methods.</span></li>
</ol>

<p><span style="font-size:16px;">Challenge: Weather Forecast </span></p>

<p><span style="font-size:16px;">Write a Python program that uses the WeatherStack API to fetch the weather forecast for a specified city and display it to the user.</span></p>

<p><span style="font-size:16px;">Requirements:</span></p>

<ol>
	<li><span style="font-size:16px;">You need to sign up on the OpenWeatherMap website to get your API key. The API key is required to make requests to their API.</span></li>
	<li><span style="font-size:16px;">The program should prompt the user to enter the name of a city.</span></li>
	<li><span style="font-size:16px;">It should then make an HTTP GET request to the OpenWeatherMap API to fetch the weather data for the specified city.</span></li>
	<li><span style="font-size:16px;">Parse the JSON response to extract and display the following information:</span></li>
</ol>

<ul style="margin-left: 40px;">
	<li><span style="font-size:16px;">City name</span></li>
	<li><span style="font-size:16px;">Country</span></li>
	<li><span style="font-size:16px;">Temperature (in Celsius)</span></li>
	<li><span style="font-size:16px;">Weather description (e.g., sunny, cloudy, etc.)</span></li>
</ul>


In [None]:
response_json = {"request":{"type":"City","query":"New York, United States of America",
                            "language":"en",
                            "unit":"m"},
                  "location":{"name":"New York",
                              "country":"United States of America",
                              "region":"New York",
                              "lat":"40.714",
                              "lon":"-74.006",
                              "timezone_id":"America\/New_York",
                              "localtime":"2023-07-19 21:06",
                              "localtime_epoch":1689800760,
                              "utc_offset":"-4.0"},
                  "current":{"observation_time":"01:06 AM",
                             "temperature":25,
                             "weather_code":143,
                             "weather_icons":["https:\/\/cdn.worldweatheronline.com\/images\/wsymbols01_png_64\/wsymbol_0006_mist.png"],
                             "weather_descriptions":["Haze"],
                             "wind_speed":4,
                             "wind_degree":10,
                             "wind_dir":"N",
                             "pressure":1017,
                             "precip":0,
                             "humidity":77,
                             "cloudcover":25,
                             "feelslike":28,
                             "uv_index":1,
                             "visibility":8,
                             "is_day":"no"}}



In [None]:
#hidden API Key

In [None]:
import requests

def get_weather_data(city):
    #api_key = 'Get a Key'
    url = f'http://api.weatherstack.com/current?access_key={api_key}&query={city}&units=m'
    
    response = requests.get(url)
    
    if response.status_code == 200:
        data = response.json()
        city_name = data["location"]["name"]
        country = data["location"]["country"]
        temperature = data["current"]["temperature"]
        weather_desc = data["current"]["weather_descriptions"][0]

        return city_name, country, temperature, weather_desc

    else:
        return None

def main():
    city = input("Enter the name of a city: ")
    weather_data = get_weather_data(city)

    if weather_data:
        city_name, country, temperature, weather_desc = weather_data
        print(f"\nWeather Forecast for {city_name}, {country}:")
        print(f"Temperature: {temperature}°C")
        print(f"Weather: {weather_desc.capitalize()}")
    else:
        print("Error fetching weather data. Please try again later.")

if __name__ == "__main__":
    main()
