# Data Analyst Professional Practical Exam Submission

**You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.**

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the [Markdown Guide](https://s3.amazonaws.com/talent-assets.datacamp.com/Markdown+Guide.pdf) before you start.


## 📝 Task List

Your written report should include written text summaries and graphics of the following:
- Data validation:   
  - Describe validation and cleaning steps for every column in the data 
- Exploratory Analysis:  
  - Include two different graphics showing single variables only to demonstrate the characteristics of data  
  - Include at least one graphic showing two or more variables to represent the relationship between features
  - Describe your findings
- Definition of a metric for the business to monitor  
  - How should the business use the metric to monitor the business problem
  - Can you estimate initial value(s) for the metric based on the current data
- Final summary including recommendations that the business should undertake

*Start writing report here..*

In [16]:
# import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [17]:
# read data
df = pd.read_csv('product_sales.csv')
df.head()

Unnamed: 0,week,sales_method,customer_id,nb_sold,revenue,years_as_customer,nb_site_visits,state
0,2,Email,2e72d641-95ac-497b-bbf8-4861764a7097,10,,0,24,Arizona
1,6,Email + Call,3998a98d-70f5-44f7-942e-789bb8ad2fe7,15,225.47,1,28,Kansas
2,5,Call,d1de9884-8059-4065-b10f-86eef57e4a44,11,52.55,6,26,Wisconsin
3,4,Email,78aa75a4-ffeb-4817-b1d0-2f030783c5d7,11,,3,25,Indiana
4,3,Email,10e6d446-10a5-42e5-8210-1b5438f70922,9,90.49,0,28,Illinois


## Data Validation and Cleaning

- Check for missing values and duplicates
- Handle missing values by either removing the rows or filling them with appropriate values
- Identify duplicates and decide whether to remove them or keep them
- Check for data types and convert them if necessary
- Check for outliers and decide whether to remove them or keep them

### Columns
**Week** (int): Week number (1-52) <br>
**sales_method** (categorical): Sales method (Email, Call, Email + Call) <br>
**customer_id** (string): Unique identifier for each customer <br>
**nb_sold** (int): Number of products sold <br>
**revenue** (float): Revenue generated from sales <br>
**years_as_customer** (int): Number of years the customer has been a customer <br>
**nb_site_visits** (int): Number of site visits made by the customer <br>
**state** (categorical): State of the customer (New York, California, Texas, etc.)


In [18]:
print('Total number of rows:', df.shape[0]) 
print(df.info())

Total number of rows: 15000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   week               15000 non-null  int64  
 1   sales_method       15000 non-null  object 
 2   customer_id        15000 non-null  object 
 3   nb_sold            15000 non-null  int64  
 4   revenue            13926 non-null  float64
 5   years_as_customer  15000 non-null  int64  
 6   nb_site_visits     15000 non-null  int64  
 7   state              15000 non-null  object 
dtypes: float64(1), int64(4), object(3)
memory usage: 937.6+ KB
None


In [None]:
# print categorical columns
cat_cols = df.select_dtypes(include=['object']).columns.tolist()
print('Categorical columns:', cat_cols)
fig, ax = plt.subplots(1, len(cat_cols), figsize=(15,15))
for i, col in enumerate(cat_cols):
    sns.countplot(x=col, data=df, ax=ax[i])
    ax[i].set_title(col)
    ax[i].set_xlabel('')
    ax[i].set_ylabel('')
    plt.xticks(rotation=45)
    plt.tight_layout()


Categorical columns: ['sales_method', 'customer_id', 'state']


## ✅ When you have finished...
-  Publish your Workspace using the option on the left
-  Check the published version of your report:
	-  Can you see everything you want us to grade?
    -  Are all the graphics visible?
-  Review the grading rubric. Have you included everything that will be graded?
-  Head back to the [Certification Dashboard](https://app.datacamp.com/certification) to submit your practical exam report and record your presentation