## Assignment

You will complete three tasks, each requiring you to write a function that performs a specific analysis on a provided dataset. Ensure your functions follow the provided format so they can be tested using an automated testing function.

In [1]:
import pandas as pd

import numpy as np



In [2]:
def create_sample_dataframe():
  """
  Creates a simple dataframe with three numerical columns and 10 rows.

  Returns:
    pandas.DataFrame: A dataframe with three columns named "col1", "col2", and "col3", each containing 10 random integers between 0 and 100.
  """
  np.random.seed(4)
  data = {
      "col1": np.random.randint(0, 100, 10),
      "col2": np.random.randint(0, 100, 10),
      "col3": np.random.randint(0, 100, 10),
  }
  return pd.DataFrame(data)

sample_df = create_sample_dataframe()
sample_df

Unnamed: 0,col1,col2,col3
0,46,55,55
1,55,55,21
2,69,57,21
3,1,36,73
4,87,50,38
5,72,44,56
6,50,38,66
7,9,52,46
8,58,3,30
9,94,0,8


In [3]:
#Random reviews
reviews = [
    "Great product! I love it!",
    "It's okay, but not what I expected.",
    "Very disappointed with this product.",
    "Excellent quality and value for money.",
    "I would not recommend this product.",
    "It's a good product, but a bit overpriced.",
    "I'm very satisfied with this purchase.",
    "It's a great product for the price.",
    "I'm not sure if I would buy it again.",
    "I'm very happy with this product.",
]


sample_df["reviews"] = reviews

# Print the dataframe
sample_df


Unnamed: 0,col1,col2,col3,reviews
0,46,55,55,Great product! I love it!
1,55,55,21,"It's okay, but not what I expected."
2,69,57,21,Very disappointed with this product.
3,1,36,73,Excellent quality and value for money.
4,87,50,38,I would not recommend this product.
5,72,44,56,"It's a good product, but a bit overpriced."
6,50,38,66,I'm very satisfied with this purchase.
7,9,52,46,It's a great product for the price.
8,58,3,30,I'm not sure if I would buy it again.
9,94,0,8,I'm very happy with this product.


### Calculate Correlation:

Write a function to calculate the correlation between two variables in a dataset.

In [4]:
def calculate_correlation(df, col1, col2):
    """
    Calculate the correlation coefficient between two columns in a DataFrame.

    Parameters:
    df (DataFrame): The input DataFrame.
    col1 (str): The name of the first column.
    col2 (str): The name of the second column.

    Returns:
    float: The correlation coefficient.
    """
    # Your code here
    correlation = df[col1].corr(df[col2])
    return round(correlation, 4)

    pass

In [9]:
# Simple test to check if code is correct
calculate_correlation(sample_df, "col1", "col2")
# Expected Output: -0.2657

-0.2658

### Calculate Skewness and Kurtosis:
Write a function to calculate the skewness and kurtosis of a variable in a dataset.

**Note: sklearn's skew and kurt function don't account for statistical bias. Prefer pandas functions.**

In [5]:
def calculate_skewness_kurtosis(df, col):
    """
    Calculate the skewness and kurtosis of a column in a DataFrame.

    Parameters:
    df (DataFrame): The input DataFrame.
    col (str): The name of the column.

    Returns:
    tuple: A tuple containing skewness and kurtosis.
    """
    # Your code here
    skewness = df[col].skew()
    kurtosis = df[col].kurt()
    return round(skewness, 4), round(kurtosis, 4)


    pass

In [8]:
# Simple test to check if code is correct
skew_check, kurt_check = calculate_skewness_kurtosis(sample_df, "col1")
print(f"Skewness: {skew_check}, Kurtosis: {kurt_check}")
# Expected Output: Skewness: -0.6796, Kurtosis: -0.113

Skewness: -0.6797, Kurtosis: -0.1137


### Sentiment Analysis Summary:
Write a function to perform sentiment analysis on a dataset of text reviews and return the counts of positive, negative, and neutral reviews.

In [6]:
from textblob import TextBlob

def sentiment_analysis_summary(df, text_col):
    """
    Perform sentiment analysis on text data and return the counts of positive, negative, and neutral reviews.

    Parameters:
    df (DataFrame): The input DataFrame.
    text_col (str): The name of the column containing text data.

    Returns:
    dict: A dictionary with counts of positive, negative, and neutral reviews.
    """
    # Your code here
    positive, negative, neutral = 0, 0, 0

    for review in df[text_col]:
        analysis = TextBlob(review).sentiment.polarity
        if analysis > 0:
            positive += 1
        elif analysis < 0:
            negative += 1
        else:
            neutral += 1

    return {"positive": positive, "negative": negative, "neutral": neutral}


    pass

In [7]:
# Simple test to check if code is correct
sentiment_analysis_summary(sample_df, "reviews")
# Expected Output: {'positive': 7, 'negative': 2, 'neutral': 1}


{'positive': 7, 'negative': 2, 'neutral': 1}