# Data Analyst Professional Practical Exam Submission

**You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.**

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the [Markdown Guide](https://s3.amazonaws.com/talent-assets.datacamp.com/Markdown+Guide.pdf) before you start.


## 📝 Task List

Your written report should include written text summaries and graphics of the following:
- Data validation:   
  - Describe validation and cleaning steps for every column in the data 
- Exploratory Analysis:  
  - Include two different graphics showing single variables only to demonstrate the characteristics of data  
  - Include at least one graphic showing two or more variables to represent the relationship between features
  - Describe your findings
- Definition of a metric for the business to monitor  
  - How should the business use the metric to monitor the business problem
  - Can you estimate initial value(s) for the metric based on the current data
- Final summary including recommendations that the business should undertake

*Start writing report here..*

**Data Validation**

The dataset contains 15,000 rows and 8 columns before validation and cleaning. I validated all of the columns against the criteria in the provided dataset table. Details on how each column was validated and cleaned (if necessary) are included below. Note that the columns were validated and cleaned in the order they are listed.

- revenue: values are a numeric data type with two decimal places as specified. Revenue was the only column with null values. Before cleaning, there were 1,074 null values. Since the number of null values was relatively low, I filtered out all rows where revenue was null. After removing rows with null revenues, there were 13,926 rows in the dataset.
- week: values are a numeric data type and range from 1 to 6 weeks as specified. No cleaning is needed.
- sales_method: values are a character data type, as specified. Before cleaning, the column contained two extra categories, 'em + call' and 'email'. I changed the spelling of these to 'Email + Call' and 'Email' since there should only be three unique values for sales_method.
- customer_id: values are a character data type as specified. There are 13,926 unique customer ids. No cleaning needed.
- nb_sold: values are a numeric data type as specified. Values are whole numbers and no cleaning is needed.
- years_as_customer: values are a numeric data type as specified. Since the company was founded in 1984, there should not be any customers older than 39 years. Before cleaning, the maximum age of years_as_customer was 63 which was incorrect. I filtered out all rows where the years_as_customer value was greater than 39. After cleaning, there were 13,924 rows in the dataset.
- nb_site_visits: values are a numeric data type as specified. No cleaning needed.
- state: values are a character data type as specified. There are 50 unique values representing each state. No cleaning needed.

After data validation and cleaning, the dataset contains 13,924 rows and 8 columns with no null values.

**Customers per Approach**

6,921 customers were emailed, 4,780 were called, and 2,223 were both emailed and called.

![](customers_per_approach.png)


**Spread of Revenue**

To visualize the spread of revenue, I used a histogram. Overall, revenue ranges from 32.54 to 238.32 US Dollars (USD), with the majority of revenues being less than $120.

![](revenue_dist_all.png)

Comparing the distribution of revenue across sales methods shows that calls generated the lowest revenues, emails generated moderate revenues, and emails and calls generated the highest revenues. The majority of sales were made by the call and email methods shown in orange and green, respectively.

![](revenue_dist_by_salesmethod.png)


**Revenue Over Time**

As shown in the graph below, sales made by email generated much more revenue than the other sales methods when the new product line was first launched. However, the weekly revenue generated by only sending emails quickly decreased and continued to generally decrease over the next five weeks.

Sales made by calling generated slightly more revenue in the first week than the email and call method. The weekly revenue made by calling slowly increased until week 6. The weekly revenue from calling was consistently among the lowest compared to the other two methods.

The email and call method generated the lowest revenue in week 1. However, the weekly revenue from this sales method increased at a much greater rate than the revenue from only calls. By week 5, the weekly revenue made by emails and calls surpassed the other sales methods. Even though the weekly revenue decreased from week 5 to week 6 for all of the sales methods, the revenue made from emails and calls in week 6 was still the largest.

![](revenue_over_time_per_week.png)

Another way to compare the three sales methods is to plot the cumulative revenue generated over time. Calls generated the lowest total revenue, emails and calls generated the second highest total revenue, and emails generated the highest total revenue. The email sales method seems to have been the best sales method at the beginning of the new product line launch, but its cumulative revenue increased more slowly as time went on. Meanwhile, the cumulative revenue from emails and calls increased at a greater rate than the other sales methods. If one sales method should be used going forward, it should be emails and calls.

![](cumulative_revenue_over_time.png)

**Business Metrics**

The goal is to make sure the best technique is being used to sell the new product line effectively, so I recommend using the weekly revenue generated by the email and call sales method as the metric.

The revenue generated by the email and call sales method in week 6 was 111,152.07 USD. If this value continues to generally increase, it indicates that the email and call sales method is effective at selling the new product line.


**Recommendations**

Although the email sales method was very effective at selling the new product line at the beginning of its launch, its effectiveness sharply decreased over the following weeks as shown by the revenue it generated per week. The email and call method did not earn as much revenue at the start of the product line launch, but the revenue it generated per week increased at a greater rate than the other two sales methods. If one sales method should be used for this new product line going forward, **I would recommend the email and call method.**
Additionally, I recommend that the call sales method should be abandoned when selling this product line. Sales made by calls alone generated the lowest total revenue and is not as efficient as the email and call method. With the email and call method, the follow-up phone call only takes an average of 10 minutes instead of 30 minutes.