# Codealpha Task - 1 
#  Web Scraping


In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd


In [4]:
# Base URL
url = 'http://quotes.toscrape.com/'
url

'http://quotes.toscrape.com/'

In [6]:
# Send request to website
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
soup

<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Quotes to Scrape</title>
<link href="/static/bootstrap.min.css" rel="stylesheet"/>
<link href="/static/main.css" rel="stylesheet"/>
</head>
<body>
<div class="container">
<div class="row header-box">
<div class="col-md-8">
<h1>
<a href="/" style="text-decoration: none">Quotes to Scrape</a>
</h1>
</div>
<div class="col-md-4">
<p>
<a href="/login">Login</a>
</p>
</div>
</div>
<div class="row">
<div class="col-md-8">
<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
            Tags:
            <meta class="keywords" content="change,deep-thoughts,thinking,world" itemprop="keywords"/>
<a class="

In [8]:
print(response.status_code)

200


In [10]:
quotes = []
authors = []
tags = []

# Find all quote blocks
quote_blocks = soup.find_all('div', class_='quote')

# Loop through each quote block and extract data
for block in quote_blocks:
    quote = block.find('span', class_='text').text.strip()
    author = block.find('small', class_='author').text.strip()
    tag_list = [tag.text for tag in block.find_all('a', class_='tag')]
    
    quotes.append(quote)
    authors.append(author)
    tags.append(tag_list)


In [12]:
df = pd.DataFrame({
    'Quote': quotes,
    'Author': authors,
    'Tags': tags
})

# Show the first few rows of the DataFrame
df.head()


Unnamed: 0,Quote,Author,Tags
0,“The world as we have created it is a process ...,Albert Einstein,"[change, deep-thoughts, thinking, world]"
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling,"[abilities, choices]"
2,“There are only two ways to live your life. On...,Albert Einstein,"[inspirational, life, live, miracle, miracles]"
3,"“The person, be it gentleman or lady, who has ...",Jane Austen,"[aliteracy, books, classic, humor]"
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe,"[be-yourself, inspirational]"


In [14]:
df.to_csv('quotes.csv', index=False)
print("CSV file saved successfully!")


CSV file saved successfully!


## 🔍 Insight 1: Missing Values
The dataset contains missing values in several columns:

- **Age**: ~20% missing values  
- **Cabin**: Over 75% missing values  
- **Embarked**: Very few missing entries  

**➡️ Strategy:**  
- Impute **Age** with median or mean  
- Drop the **Cabin** column  
- Fill **Embarked** with the most frequent value  

---

## 📊 Insight 2: Survival by Gender
- Female passengers had a significantly higher survival rate than males.  
- This suggests that women and children were prioritized during evacuation.

**➡️ Conclusion:**  
- **Gender** is an important feature in predicting survival.

---

## 🏷️ Insight 3: Survival by Class
- Passengers in **1st class** had a higher survival rate compared to **2nd** and **3rd** classes.  
- **3rd class** had the lowest survival rate.

**➡️ Conclusion:**  
- **Socio-economic status** influenced the chance of survival.

---

## 📈 Insight 4: Age Distribution
- Most passengers were in the **20 to 40 years** age group.  
- Fewer elderly and children were onboard.  
- Survival rate for **children (age < 10)** was relatively higher.

**➡️ Conclusion:**  
- **Age** shows a **non-linear** relationship with survival.

---

## 🧍 Insight 5: Family Size Effect
- Passengers with **small families (1–3 members)** had better chances of survival.  
- Those traveling **alone** or with **large families** had lower survival rates.

**➡️ Conclusion:**  
- A **moderate family size** may offer social support during emergencies.

---

## ⚓ Insight 6: Embarkation Port
- Most passengers embarked from **Southampton (S)**.  
- Passengers from **Cherbourg (C)** had higher survival rates.

**➡️ Conclusion:**  
- **Embarked location** may be correlated with **class** or **cabin location**.
