## IS-4100: Web Scraping NFL Draft Data

## Overview:
In this lab, you will learn how to gather NFL draft and scouting data from various websites through web scraping, perform exploratory data analysis (EDA), and build a machine learning model to predict player success in the NFL. You will use **Python / R**, with libraries such as **BeautifulSoup**, **RSelenium**, and **requests** for scraping, **pandas** (Python) or **dplyr** and **tidyverse** (R) for data manipulation, and **scikit-learn** (Python) or **caret** (R) for model building.

## Objectives:
- Learn web scraping techniques using Python or R.
- Perform exploratory data analysis (EDA) on NFL draft data.
- Build a predictive model to analyze draft data and predict player success (e.g., Pro Bowl appearances, number of career games, or player performance ratings).
- Evaluate the model's performance and analyze its implications.

## Instructions:

### Part 1: Web Scraping NFL Draft Data
**Task**: Write a script to scrape NFL draft data. You may use Python or R for this task. Suggested websites include:
- Pro Football Reference (NFL Draft)
- NFL Scouting Data

**For Python**:
- Use libraries such as `BeautifulSoup`, `requests`, and `lxml`.
- Example: Scraping player data using `BeautifulSoup` and cleaning it with `pandas`.

**For R**:
- Use libraries such as `rvest`, `RSelenium`, and `xml2`.
- Example: Scraping player data with `rvest` and cleaning it with `dplyr`.

You are expected to scrape the following information for at least one NFL draft year:
- Player name
- Position
- College
- Draft round and pick number
- NFL team

**Deliverables**:
- A cleaned `pandas` DataFrame (Python) or a cleaned `tibble` (R) containing the scraped data.
- Save the data to a `.csv` file for further analysis.

### Part 2: Exploratory Data Analysis (EDA)
**Task**: Perform an EDA on the scraped NFL draft data. Analyze the following:
- Distribution of players by position.
- Number of players drafted by round.
- Success metrics like Pro Bowl appearances, number of games played, or All-Pro selections (if available).

**Questions to Explore**:
- Are certain positions more frequently drafted in the early rounds?
- Which colleges or conferences have the highest number of players drafted?

**For Python**:
- Use `pandas` and `matplotlib` or `seaborn` for visualizations.

**For R**:
- Use `ggplot2` for visualizations and `dplyr` for data manipulation.

**Deliverables**:
- Visualizations (bar charts, histograms, etc.) of your findings.
- Summary statistics (mean, median, mode) for key variables like draft round and "career value".

### Part 3: Predictive Modeling
**Task**: Build a machine learning model to predict player success based on draft data. You may use either Python or R for model building.

- **For Python**:
  - Use `scikit-learn` for model building and evaluation.
  
- **For R**:
  - Use `caret` for model building and evaluation.

**Steps**:
1. Split the data into training and testing sets.
2. Choose an appropriate model (e.g., logistic regression, decision tree, random forest).
3. Train the model and evaluate its performance using appropriate metrics such as accuracy, precision, and recall.

**Deliverables**:
- Model training and evaluation code.
- A brief explanation of the model's performance and potential ways to improve it.

### Part 4: Analysis and Discussion
**Task**: Analyze the results of your model and discuss the implications. Consider the following:
- What features were most important in predicting success?
- How could the model be improved (e.g., by gathering more data or using more advanced techniques)?
- Discuss potential biases in the data (e.g., players from larger schools being favored) and how they might affect the model.

**Deliverables**:
- A short report (2-3 paragraphs) summarizing your analysis and reflections on the results.
