This is the template for DS3000 Final data analysis project. Once you finish, please remove all my instructions. You do not need to exactly follow the structure in the template but please make sure you have all the components. Write your report in paragraphs. Only use bullet points when list something (eg: functions) 

# Title 
#### Team number
- List your group members' name here

In [1]:
# Put all the module import in this code chunk
import matplotlib.pyplot as plt


def create_scatter_plots(data):
    """
    Creates scatter plots for comparing rarity with HP, defense, and price of Pals.
    
    Args:
    data (DataFrame): The DataFrame containing Pal data with columns for rarity, hp, defense, and price.
    
    Returns:
        Three scatter plots:
            1. Rarity vs HP
            2. Rarity vs Defense
            3. Rarity vs Price
    """
    # Sort the data by hp, defense, and price to ensure correct order
    data_sorted_hp = data.sort_values(by='hp', ascending=True)
    data_sorted_level = data.sort_values(by='rarity level', ascending=True)
    data_sorted_defense = data.sort_values(by='defense', ascending=True)
    data_sorted_price = data.sort_values(by='price', ascending=True)

    # Create a scatter plot for rarity vs HP
    plt.scatter(data_sorted_level['rarity level'], data_sorted_hp['hp'])
    plt.title('Rarity vs HP')
    plt.xlabel('Rarity')
    plt.ylabel('HP')
    plt.show()
    
     # Create a scatter plot for rarity vs HP
    plt.scatter(data_sorted_level['rarity level'], data_sorted_hp['hp'])
    plt.title('Rarity vs HP')
    plt.xlabel('Rarity')
    plt.ylabel('HP')
    plt.show()
    



## Introdunction

- One or two paragraphs about the background of the project. eg: the backgound of PalWorld and why your analysis can be interesting
- State your research questions. Limit the number of research questions to be one or two. 

## Data 

### Data Source

- List the website you have scraped the data from.
- List which information you have scraped
- Describe what kind of cleaning you have done to the data

### Webscraping and cleaning functions overview

List all the functions you have written for webscraping and data cleaning. For each one, write one sentence to describe it. 
- `extract_soup()`
    - build url and return soup object

### Data overview

- Show a couple of rows of the cleaned data you are going to use for the analysis
- Which is your target value (if exists)
- Give a general summary about the other features
- Discuss if there is any potential problems about the data (eg: missing values, any features that you did not collect but may be important, any other concerns)

## Webscraping and cleaning

In [2]:
# list all the functions you have for webscraping and cleaning. Make sure write full 
# docstrings for each function
def extract_soup():

    pass

In [3]:
# Write the code to load the data with the functions. You don't need to run the code every time. 
# You can run the code once and save the scrapped data into a csv file. Then load the csv file 
# for the rest of the analysis

## Visualizations

### Visualization functions overview
List all the functions you have written for visualization. For each one, write one sentence to describe it. 
- `make_hist()`
    - Generate a histogram with given data and feature
 
### Visualization results
- Present 3-4 data visualizations.
- For each visualization, you need to include title, xlabel, ylabel, legend (if necessary)
- For each visualization, explain why you make this data visualization (how it related to your research question) and explain what you have learned from this visualization

In [4]:
# list all the functions you have for visualization. Make sure write full 
# docstrings for each function
def make_hist(df, y_feat):

    pass

#### visualization 1

In [5]:
# Write the code to run functions to get each data visualization in separate code chunks. 
# Interpret the figures. 

#### visualization 2

In [6]:
def plot_element_distribution_by_rarity(df):
    """ Plots pie charts showing the percentage of Pals of each element for each rarity.
    Args:
        df (pd.DataFrame): DataFrame that contains a column called 'elements' that contains the PAL's elements and 'rarity' that contains the PAL's rarety
    Returns:
        4 pie charts what shows the Element Distribution for each rarity for all the PALs
    """

    # Define all the rarety levels possible for a PAL
    rarity_levels = ['common', 'rare', 'epic', 'legendary']

    # Define a the colors for each availble element 
    element_colors = {
        'normal': 'lightgrey',
        'water': 'aqua',
        'fire': 'darkorange',
        'leaf': 'forestgreen',
        'dark': 'dimgray',
        'dragon': 'mediumblue',
        'earth': 'lime',
        'ice': 'skyblue',
        'electricity': 'yellow'
    }

# craetes the plot size 
    plt.figure(figsize=(15, 15))

    # iterates over each rarety level of the PALS
    for _, rarity in enumerate(rarity_levels):
        elements_list = []

        # iterates over each row in the given Data frame
        for index, row in df.iterrows():

            # checks if the rarety of the row mataches the first rarety level loop then adds to the list if true
            if row['rarity'].lower() == rarity:
                elements_list.extend(row['elements'])

        #Calculates the percent of the element per rarety
        element_counts = pd.Series(elements_list).value_counts(normalize=True) * 100

    # loops each element in the dictionary and gets the color and adds it to the list
        colors = []
        for element in element_counts.index:
            colors.append(element_colors[element])

        # creates the subplots and the pie chart
        plt.subplot(2, 2, _ + 1)
        plt.pie(element_counts, labels=element_counts.index, autopct='%1.1f%%', colors=colors)
        plt.title(f'Element Distribution for {rarity} Pals')

    plt.show()

# Runs the function based on our PALs dataframe
plot_element_distribution_by_rarity(df)

#### visualization 3

In [7]:
# Write the code to run functions to get each data visualization in separate code chunks. 
# Interpret the figures. 

## Models

### Modeling functions overview
List all the functions you have written for modeling. For each one, write one sentence to describe it. 
- `fit_linear()`
    - fit a linear model to the data and output the r2, slope and intercept

### Model results

- Present 2-3 models for the analysis.
- Explain any pre-processing steps you have done (eg: scaling, polynomial, dummy features)
- For each model, explain why you think this model is suitable and what metrics you want to use to evaluate the model
    - If it is a classification model, you need to present the confusion matrix, calculate the accuracy, sensitivity and specificity with cross-validation
    - If it is a regression model, you need to present the r2 and MSE with cross-validation
    - If it is a linear regression model/multiple linear regression model, you need to interpret the meaning of the coefficient with the full data
    - If it is a decision tree model, you need to plot the tree with the full data
    - If it is a random forest model, you need to present the feature importance plot with the full data
    - If it is a PCA, you need to explain how to select the number of components and interpret the key features in the first two components
    - If it is a clustering, you need explain how to select the number of clustering and summarize the clustering. 

In [8]:
# list all the functions you have for modeling. Make sure write full 
# docstrings for each function
def fit_linear(df, y_feat, x_feat):

    pass

#### Model 1

In [9]:
# Write the code to run functions to fit each model in separate code chunks. 
# Interpret the model results. 

#### Model 2

In [10]:
# Write the code to run functions to fit each model in separate code chunks. 
# Interpret the model results. 

#### Model 3

In [11]:
# Write the code to run functions to fit each model in separate code chunks. 
# Interpret the model results. 

## Discussion

- One or two paragraphs to summarize your findings in the modeling sections and do the models answer your research question?
- Any other potential thing you can do with the analysis (eg: include more features, get more data, try some other models etc.)
- List the contribution for each group member.