### Roadmap to Learn Data Analysis Using Python

- **Strong foundation in Python basics**
  - Programming fundamentals, variables, data types
  - Control structures (if, for, while)
  - Functions and scope
  - Containers: lists, tuples, dictionaries, sets
  - List/dictionary comprehensions
  - Basic error handling

- **NumPy**
  - Creating, indexing, and slicing arrays
  - Array operations and broadcasting
  - Mathematical and statistical functions
  - Reshaping and manipulating arrays

- **Pandas**
  - DataFrames and Series: creation and manipulation
  - Reading/writing data (CSV, Excel, etc.)
  - Data cleaning, handling missing data
  - Filtering, sorting, grouping, aggregation
  - Merging and joining datasets

- **Data Visualization**
  - Matplotlib: basic plots (line, bar, scatter, histogram)
  - Seaborn: advanced and statistical visualizations
  - Customizing plots and dashboards

- **Exploratory Data Analysis (EDA)**
  - Descriptive statistics (mean, median, mode, variance)
  - Outlier detection and handling
  - Feature engineering and transformation
  - Correlation analysis

- **Working with Real-world Data**
  - APIs and web data (using requests, BeautifulSoup, or Scrapy for web scraping)
  - Working with time series data (datetime module, pandas time-indexed data)

- **Statistical Analysis**
  - Probability concepts
  - Hypothesis testing
  - Using libraries like scipy.stats for statistical tests

- **Introduction to Machine Learning (optional but recommended)**
  - Scikit-learn basics: supervised and unsupervised learning
  - Data preprocessing for ML
  - Model evaluation (accuracy, confusion matrix, cross-validation)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Reading Data from a CSV file
df = pd.read_csv(r'coffee_sales.csv', header=3)
df.dropna(inplace=True, axis=1, how='all')

# Clean data by converting 'Date' to datetime and Sales, Profit to numeric
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['Profit'] = df['Profit'].str.replace('$', '', regex=False).astype(float)
df['Sales'] = df['Sales'].str.replace('$', '', regex=False).astype(float)
df['Target Sales'] = df['Target Sales'].str.replace('[$,]', '', regex=True).astype(float)
df['Target Profit'] = df['Target Profit'].str.replace('[$,]', '', regex=True).astype(float)

df.head()

Unnamed: 0,Date,Franchise,City,Product,Sales,Profit,Target Profit,Target Sales
0,2021-01-01,M1,Mumbai,Amaretto,219.0,94.0,100.0,220.0
1,2021-02-01,M1,Mumbai,Amaretto,140.0,34.0,50.0,140.0
2,2021-03-01,M1,Mumbai,Amaretto,145.0,-2.0,30.0,180.0
3,2021-04-01,M1,Mumbai,Amaretto,45.0,11.0,20.0,40.0
4,2021-05-01,M1,Mumbai,Amaretto,120.0,13.0,30.0,120.0


Ex. Visualise Target status frquency

Ex. Descriptive statistics

Ex. Correlation bewteen Sales and Profit