# COMP1800 - Data Visualization (Coursework)
<u>Introduction</u>: The task embarked upon involves a comprehensive analysis of ChrisCo, a fictional cinema chain in the UK, through the lens of data visualization. The process entails compiling and examining data to glean insights into the company's operations, customer demographics, and financial performance. This report outlines the steps taken and achievements made using Python, specifically within a Visual Studio Code environment.

#### Setting Up a Python Environment

Setting up a dedicated Python environment is essential for managing dependencies in data science projects. You can use either Anaconda or Miniconda for this purpose.

**Creating and Activating a New Environment**:
```bash
conda create --name COMP1804-AML python=3.10.10
conda activate COMP1804-AML
```

#### Installing Required Packages

Ensure all necessary packages are installed by using a `pip install` command that references a `requirements.txt` file. This file lists all packages needed to run the Jupyter notebook effectively.

**Installing Packages**:
```python
%pip install -r ../Docs/requirements.txt --quiet
```
Note that the `--quiet` flag is used to suppress any unnecessary output from package installation, which can make it easier to see when there are errors.

**Downloading  Datasets**:
```python
%run ../Datasets/download.py
```
This will download the datasets specified in coursework specificatiom into the 'Datasets' directory specified by the `directory` variable in `../Datasets/download.py`.

**Setting Up an IPython Kernel for Jupyter**:
To use the new Python environment in Jupyter, install a new IPython kernel with the environment.
```bash
python -m ipykernel install --user --name=COMP1800-DV --display-name "COMP1800-DV(IPYNB)"
conda install -n COMP1800-DV ipykernel --update-deps --force-reinstall
```
This creates a kernel named `COMP1800-DV` for Jupyter, ensuring it uses the specific Python environment created for this coursework.

In [11]:
%pip install -r ../Docs/requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.


## Data Collection
Access and download your specific datasets from the provided links by replacing 'ID' with student ID number.

In [12]:
%run ../Datasets/download.py

Downloaded and saved CinemaWeeklyVisitors.csv successfully.
Downloaded and saved CinemaAge.csv successfully.
Downloaded and saved CinemaCapacity.csv successfully.
Downloaded and saved CinemaMarketing.csv successfully.
Downloaded and saved CinemaOverheads.csv successfully.
Downloaded and saved CinemaSpend.csv successfully.


## Importing Libraries
Importing necessary libraries needed for the this coursework.

In [7]:
try:
    import requests
    import pandas as pd
    import os
except Exception as e:
    print(f"Error : {e}")

## Loading and Inspecting the Dataset
Datasets spanning weekly visitors, cinema ages, capacities, marketing expenditures, overheads, and customer spend were meticulously gathered and loaded for subsequent analysis.

In [9]:
# Load the datasets
directory = '../Datasets/'
age_df = pd.read_csv(f'{directory}CinemaAge.csv')
capacity_df = pd.read_csv(f'{directory}CinemaCapacity.csv')
marketing_df = pd.read_csv(f'{directory}CinemaMarketing.csv')
overheads_df = pd.read_csv(f'{directory}CinemaOverheads.csv')
spend_df = pd.read_csv(f'{directory}CinemaSpend.csv')
weekly_visitors_df = pd.read_csv(f'{directory}CinemaWeeklyVisitors.csv')

# Display the first few rows of each dataframe to understand their structure
(age_df.head(), capacity_df.head(), marketing_df.head(), overheads_df.head(), spend_df.head(), weekly_visitors_df.head())

(    Id  Avg age (yrs)
 0  UDD             27
 1  CCX             38
 2  VJV             41
 3  WVA             45
 4  AKA             26,
     Id  Seating capacity
 0  UDD               163
 1  CCX                30
 2  VJV               449
 3  WVA               181
 4  AKA                43,
     Id  Marketing (£000s)
 0  UDD                  5
 1  CCX                  2
 2  VJV                 13
 3  WVA                 24
 4  AKA                  2,
     Id  Overheads (£000s)
 0  UDD                 65
 1  CCX                 18
 2  VJV                 87
 3  WVA                 58
 4  AKA                 13,
     Id  Avg spend (£)
 0  UDD             15
 1  CCX             19
 2  VJV             15
 3  WVA             15
 4  AKA             12,
          Date  UDD  CCX   VJV   WVA  AKA  JJQ  SJE  WQW  ZWY  ...  TJN  TPY  \
 0  2019-01-01  372    0   845   923    0  163  314  160  191  ...  411  436   
 1  2019-01-08  378    0  1012   725    0  148  303  195  165  ...  442  444   

## Creating Dataframes
**Summary DataFrame**: This includes one row for each cinema, with details such as average age of visitors, seating capacity, marketing spend, overheads, and average spend per visitor.

**Customer DataFrame**: This is derived from the weekly visitors data, reformatted to include one row for each date with the cinema ID and the number of weekly visitors.

In [10]:
# Create the summary dataframe by merging the individual dataframes on the 'Id' column
summary_df = pd.merge(age_df, capacity_df, on='Id', how='inner')
summary_df = pd.merge(summary_df, marketing_df, on='Id', how='inner')
summary_df = pd.merge(summary_df, overheads_df, on='Id', how='inner')
summary_df = pd.merge(summary_df, spend_df, on='Id', how='inner')

# Rename columns for clarity
summary_df.columns = ['Cinema ID', 'Average Age (Years)', 'Seating Capacity', 'Marketing Spend (£000s)', 'Overheads (£000s)', 'Average Spend (£)']

# The customer dataframe will be the weekly visitors dataframe as it already represents one row per date
customer_df = weekly_visitors_df.melt(id_vars=["Date"], var_name="Cinema ID", value_name="Weekly Visitors")

(summary_df.head(), customer_df.head())

(  Cinema ID  Average Age (Years)  Seating Capacity  Marketing Spend (£000s)  \
 0       UDD                   27               163                        5   
 1       CCX                   38                30                        2   
 2       VJV                   41               449                       13   
 3       WVA                   45               181                       24   
 4       AKA                   26                43                        2   
 
    Overheads (£000s)  Average Spend (£)  
 0                 65                 15  
 1                 18                 19  
 2                 87                 15  
 3                 58                 15  
 4                 13                 12  ,
          Date Cinema ID  Weekly Visitors
 0  2019-01-01       UDD              372
 1  2019-01-08       UDD              378
 2  2019-01-15       UDD              360
 3  2019-01-22       UDD              347
 4  2019-01-29       UDD              387)