# Data Analyst Professional Practical Exam Submission

**You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.**

You can use any markdown formatting you wish. If you are not familiar with Markdown, read the [Markdown Guide](https://s3.amazonaws.com/talent-assets.datacamp.com/Markdown+Guide.pdf) before you start.


## 📝 Task List

Your written report should include written text summaries and graphics of the following:
- Data validation:   
  - Describe validation and cleaning steps for every column in the data 
- Exploratory Analysis:  
  - Include two different graphics showing single variables only to demonstrate the characteristics of data  
  - Include at least one graphic showing two or more variables to represent the relationship between features
  - Describe your findings
- Definition of a metric for the business to monitor  
  - How should the business use the metric to monitor the business problem
  - Can you estimate initial value(s) for the metric based on the current data
- Final summary including recommendations that the business should undertake

*Start writing report here..*

##  Data Validation

The dataset contains **15000 rows and 8 columns** before cleaning and validation. I have validated all the columns against the criteria in the dataset table:

- week: this column contains integer values ranging from 1 to 6, representing each week. It has no missing values, so no modifications were required.
- sales_method: initially contained 5 unique values ('Email', 'Call', 'Email + Call', 'email', 'em + call') due to inconsistent labeling. These were corrected by standardizing capitalization and spacing, ensuring consistency, and mapping incorrect values to their proper categories, resulting in 3 final categories ('Email', 'Call', 'Email + Call'), with no missing values. Valid after cleaning.
- customer_id: character values without missing values, unique per customer, as expected. Checked for duplicates—none found. Data integrity maintained. No cleaning is needed.
- nb_sold: numeric values without missing values, same as the description. All values are positive and within an expected range. No cleaning is needed.
- revenue: numeric values with 1074 missing values, which were filled by multiplying the median revenue per unit sold by nb_sold for each sales method to ensure accurate and consistent revenue distribution.Valid after imputation.
- years_as_customer: numeric values without missing values, but 2 rows contained invalid entries. Since the company was founded in 1984, the maximum possible value should be 41 years. However, two records had values of 47 and 63 years, which are impossible. To ensure data accuracy, these two rows were removed from the dataset.
- nb_site_visits: numeric values without missing values, same as the description. All values fall within a reasonable range. No changes were necessary.
- state: 50 unique states without missing values, same as the description. Checked for inconsistencies—none found. No changes were necessary.

After the data validation, the dataset contains **14,998 rows and 8 columns**, ensuring accuracy, consistency, and completeness for analysis.




## How Many Customers Were There for Each Approach?
As we can see from the countplot, Email had the highest number of customers, with approximately 7,500 transactions, nearly 1.5 times more than Call (around 5,000 transactions) and three times more than Email + Call (just over 2,500 transactions). While Email attracts the most customers, it does not necessarily generate the highest revenue per transaction.

![customer_count](customer_count.png)

## What Does the Spread of Revenue Look Like Overall?
The histogram reveals that most transactions generate revenue between \$50 and \$120, while fewer transactions exceed \$230. The revenue distribution is slightly right-skewed, meaning a small number of transactions contribute significantly more revenue than the majority. Understanding these high-revenue transactions is crucial for maximizing profitability.

![revenue_distribution](revenue_distribution.png)

## What Does the Spread of Revenue Look Like for Each Method?
As shown in the box plot, Email + Call has the highest median revenue per transaction at approximately \$185 and the widest revenue range, indicating a large variation in revenue per sale. This makes it almost twice as profitable as Email (\$95 per transaction) and nearly four times higher than Call (just below \$50). This confirms that Email + Call delivers the most value per sale, while also showing opportunities to optimize high-revenue transactions.

![revenue_by_method](revenue_by_method.png)

## Was There Any Difference in Revenue Over Time for Each Method?
The line chart demonstrates how Email starts strong in Week 1 but declines steadily and does not recover. Call grows gradually, reaching its peak in Week 5, but drops afterward. Email + Call also peaks in Week 5 but maintains higher revenue after its peak, making it the most stable and profitable method long-term.

![revenue_over_time](revenue_over_time.png)

## Based on the Data, Which Sales Method Should We Continue to Use?
As we can see from the bar chart, Email generated the highest total revenue, over \$700K, which is approximately three times that of Call (almost \$250K). Meanwhile, Email + Call, despite having fewer customers, still contributed over \$450K, proving its efficiency in generating high-value transactions. Call has the lowest total revenue, reinforcing its lower effectiveness. However, total revenue alone does not determine the best method, as revenue efficiency per customer is equally important.

![total_revenue](total_revenue.png)

The bar chart below confirms that Email + Call generates the highest revenue per customer at over \$180, making it almost double that of Email (around \$95 per customer) and nearly four times higher than Call (around \$45 per customer). This confirms that Email + Call ensures maximum revenue efficiency per transaction.

![revenue_per_customer](revenue_per_customer.png)

**_This comparison highlights that while Email brings in more total revenue, Email + Call ensures the highest revenue efficiency per transaction, making it the strongest option moving forward._**

## Business Metric to Monitor
To ensure long-term profitability, the company should focus on **Revenue per Customer per Sales Method** as our metric rather than just total sales volume. This metric provides a clearer understanding of revenue efficiency by showing how much each customer contributes to overall sales. By tracking this metric, the company can identify the most valuable sales approach and optimize its strategy accordingly. The initial estimates suggest that Email + Call generates the highest revenue per customer, making it the most effective method. Call has potential but lacks consistency, while Email, despite attracting the most customers, has lower revenue per customer. Monitoring this metric will allow the company to refine its sales strategy and focus on high-value transactions.

## Recommendations
To maximize revenue and sales efficiency, the company should implement the following strategies:
- **Prioritize the Most Profitable Sales Method:**
    - Expand the use of Email + Call, as it consistently delivers the highest revenue per customer.
    - Invest in sales training to refine this method and enhance customer interactions.
- **Enhance Email-Only Sales Performance:**
    - Test pricing or bundling strategies to increase revenue per transaction.
    - Segment customers and personalize offers to drive higher-value sales.
- **Optimize Call-Based Sales:**
    - Identify high-value leads and prioritize outreach to improve efficiency.
    - Introduce automated follow-ups to enhance conversion rates.
- **Leverage Data for Continuous Improvement:**
    - Monitor Revenue per Customer per Sales Method to adjust strategies dynamically.
    - A/B test sales strategies to determine the most effective approaches.
    - Encourage website engagement as data suggests site visitors tend to spend more.
   


## ✅ When you have finished...
-  Publish your Workspace using the option on the left
-  Check the published version of your report:
	-  Can you see everything you want us to grade?
    -  Are all the graphics visible?
-  Review the grading rubric. Have you included everything that will be graded?
-  Head back to the [Certification Dashboard](https://app.datacamp.com/certification) to submit your practical exam report and record your presentation