# Your Project Title

#### Your Name

In this assignment, you will create a notebook that uses data from the ISLP module and web-scraped data from Wikipedia. The goal is to extract real-world data, process it, and present it in a user-friendly format.

<div style="background-color: #fff2cc; border-left: 6px solid #f1c232; color: #000; padding: 10px;">
You may use AI to assist you in writing the code for this project, but you must link the transcripts in a references section at the bottom of the notebook. The exposition should be your own, though. 
Any code that is beyond the scope of this course should include a reference to documentation, a tutorial, or a generative AI chat.
</div>

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

Rename the notebook to `lastname-project.ipynb`, replacing `lastname` with your actual last name. 


## Part 1: Selecting a Dataset

ISLP is the Python companion to *An Introduction to Statistical Learning*. It includes several pedagogically curated datasets across domains (marketing, finance, health, etc.).

Skim the ISLP documentation: [https://islp.readthedocs.io/en/latest/index.html](https://islp.readthedocs.io/en/latest/index.html).  Open the **‚ÄúDatasets used in ISLP‚Äù** page and browse the available options.

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

Pick **two to three** datasets that interest you. Read their descriptions to understand variables, units, and potential questions they could answer. Then **choose one** dataset for your project and describe it in the cell below.

(You may delete this cell for your submission and presentation)

The dataset I have chosen for my project is... (complete this sentence with a brief description of the dataset).

The features of this dataset are... (list and describe the data features/columns).

* `feature 1`: description
* `feature 2`: description

## Part 2: Loading Data from a Library

Install the necessary libraries:

In [None]:
!pip install ISLP beautifulsoup4 pandas

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

If pip doesn‚Äôt work on your machine, try changing the command to `pip3` or ` python -m pip install ISLP`. Depending on your setup, `python3`3 or `py -m pip` may be required.

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

Load the chosen dataset following the ISLP docs. Use `pandas` to inspect: `.head()`, `.info()`, `.describe()`, and quick value counts as appropriate, adding the necessary code and markdown cells to your notebook.

In [None]:
import pandas as pd
from ISLP import load_data

# Look at the documentation to see how to load a specific dataset


In [None]:
# Inspect the head


In [None]:
# Inspect the info


In [None]:
# Describe the data


In [None]:
# Consider other methods in pandas to explore the data


This dataset ... (write a brief summary of your findings from this initial exploration of the dataset. What stands out? Any surprising values or distributions?).


(In this Markdown cell, pose a guiding question you plan to answer using this data and explain why the question is relevant.)

## Part 3: Scraping Data from Wikipedia

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

Many analyses benefit from a small ‚Äúside dataset‚Äù‚Äîa lookup table, a list of categories, a ranking, or a time index. Wikipedia often provides simple HTML tables that are easy to parse.

Identify a relevant Wikipedia page whose content complements your ISLP dataset (e.g., a table of regions, categories, industry codes, teams, seasons, etc.). Prefer a page with a clean HTML table.

The wikipedia page I have chosen is... (complete this sentence with the URL of the Wikipedia page and the date accessed).

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

Use the Beautiful Soup library to scrape data from Wikipedia and load the data into a `pandas.DataFrame`. Perform minimal cleaning. For example:

* Rename columns to `snake_case`,  
* Trim whitespace,  
* Convert numeric columns,  
* Drop obviously empty rows.

In brief exposition cells, explain the Wikipedia data‚Äôs provenance (URL \+ date accessed), what it contains, and how it will join or relate to your ISLP data (key columns, expected cardinality).

In [None]:
# Scrape the data


In [None]:
# Clean the data


**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

* Present the data, adding the necessary code and markdown cells to your notebook.

In [7]:
# Present the scraped data


The data from wikipedia complements the ISLP dataset by... (write a brief summary of how the Wikipedia data relates to your ISLP dataset. What new information does it provide? How will it enhance your analysis?).

## Part 4: Visualizing and Analyzing the Data

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**
 
 Create at least two unique visualizations. Here are a few ideas:

* A distribution (e.g., historogram, probability distribution, violin chart)  
* A relationship (e.g., Scatter plots, bar charts, pie charts)  
* A heatmap for trends or geographic data.  
* Pie charts for proportions.

For each visualization, provide an explanation using Markdown cells of what it represents and discuss any implications or insights derived from it.

Feel free to research different types of visualizations that may better suit your data. You are also allowed to use code snippets found online, as long as you adapt them to your own data and provide appropriate attribution in the references section.

### Visualization 1

In [None]:
# Code for visualization 1


(Visualization 1 description and analysis)

### Visualization 2

In [None]:
# Code for visualization 2


(Visualization 2 description and analysis)

## Part 5: Executive Summary

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

Write an executive summary with **\~250 words** that:

* Restates the guiding question and **answers it** with evidence from your visuals/tables.  
* Notes **limitations** (data quality, representativeness, causal caveats).  
* Suggests **one next step** (a different dataset, a type of model that could be applied - consider the data science methodology learned in the first course).

(Write an executive summary.)

## Part 6: References

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

Add a **References** section with:

* The ISLP dataset page you used,  
* Wikipedia URL(s) with **date accessed**,  
* Any docs/tutorials,  
* **Links to AI chat transcripts** (if used). 

If utilized any code beyond the course scope, include a brief parenthetical citation near that cell (e.g., ‚ÄúAdapted from \[link to chat\] or \[source\]‚Äù).

You *may* use AI to assist with code, if cited. Your **exposition must be your own**. Cite all substantive help.

# Submission requirements

**üóëÔ∏è Delete this instruction cell after completing the instruction below.**

- Appropriate file name: `lastname-project.ipynb`, replacing `lastname` with your actual last name.
- The title and your name at the top of the notebook.
- Instructions cells deleted so that only your work remains.
- Notebook runs without errors from top to bottom.
- All visualizations are rendered correctly.
- At least three unique visualizations with explanations.
- At least 250 words in the executive summary.
- References are properly cited.

Next week, you will present your research and associated dashboard to the class in a Presentation Forum, similar to what you participated in during the first course. Be prepared to discuss your data source, the challenges you faced, and how you solved them.
