# Data Visualization and Charts – Complete Overview

## 1. What is a Chart

A chart is a visual representation of data that helps quickly identify patterns, comparisons, and trends that are not easily visible in raw tables or numbers.  
It uses visual elements such as bars, lines, points, and colors to represent data values.

### Example
A bar chart showing total sales for each product category.

### Real life Use Case
An e-commerce company uses a bar chart to compare sales of categories like Electronics, Fashion, and Home Appliances to decide where to focus marketing spend.

---

## 2. What is Data Visualization

Data Visualization is the process of converting data into visual formats like charts, graphs, or maps to make complex information easier to understand and interpret.  
It helps bridge the gap between data analysis and decision making.

### Example
Turning a CSV file of monthly revenue into a line chart that shows seasonal trends.

### Real life Use Case
A marketing analyst visualizes website traffic and conversions in Power BI dashboards to track campaign effectiveness.

---

## 3. Role of Charts in Data Visualization

Charts are the core building blocks of data visualization.  
They help to
- Simplify large datasets into clear visuals  
- Highlight patterns, relationships, and trends  
- Support data-driven decision making  
- Communicate findings effectively to non-technical audiences  

### Example
A project manager uses a Gantt chart to visualize project timelines instead of reading a text-based schedule.

---

## 4. Data Types Supported by Charts

Charts can visualize different data types, and the choice of chart depends on these types

| Data Type | Description | Example | Common Charts |
|------------|--------------|----------|----------------|
| Numerical (Quantitative) | Numbers with measurable values | Age, Salary, Sales | Line, Histogram, Scatter, Box |
| Categorical (Object) | Text or labels defining categories | Country, Gender, Product Type | Bar, Pie, Treemap |
| Ordinal | Categories with a meaningful order | Rating (Low, Medium, High) | Bar, Column, Heatmap |
| Time-Series (Temporal) | Data based on time intervals | Month, Date, Year | Line, Area, Timeline |
| Boolean | True or False, Yes or No | Purchase made (Yes or No) | Bar, Pie |

---

## 5. Types of Charts in Data Visualization with Examples and Use Cases

| Chart Type | Description | Example | Real life Use Case |
|-------------|--------------|----------|---------------------|
| Bar Chart | Compares categories using rectangular bars | Sales by region | Compare revenue by department |
| Column Chart | Vertical version of bar chart | Monthly sales figures | Track monthly sales growth |
| Line Chart | Shows trends over time | Stock price over 12 months | Monitor financial or climate trends |
| Pie Chart | Shows proportion of categories in a whole | Market share of brands | Display brand market share |
| Histogram | Shows frequency distribution of numerical data | Age distribution of employees | Analyze customer age groups |
| Box Plot | Displays spread, median, and outliers | Salary distribution by department | Identify salary variation in HR |
| Scatter Plot | Shows relationship between two numerical variables | Height vs Weight | Correlation between ad spend and sales |
| Bubble Chart | Scatter plot with bubble size as third variable | Sales vs Profit by Region size equals revenue | Compare multiple business metrics |
| Area Chart | Fills the space under a line chart | Website traffic over time | Show total growth visually |
| Heatmap | Represents data through color intensity | Average marks by subject and class | Visualize performance or ratings |
| Treemap | Displays hierarchical data using nested rectangles | Revenue by region and sub region | Show category contribution |
| Violin Plot | Shows distribution and density | Customer spend by region | Visualize variation in spending behavior |
| Donut Chart | Variation of pie chart with a hollow center | Sales distribution by category | Highlight category share |
| Stacked Bar or Area Chart | Shows composition across categories | Revenue by year split by product | Show contribution over time |
| Radar Chart | Shows multivariate comparisons | Employee skill ratings | Compare performance across skills |
| Heatmap Correlation | Shows relationships between numeric variables using colors | Correlation between revenue, profit, and cost | Identify KPI relationships |

---

## 6. Summary

- Charts are visual tools to communicate data insights effectively  
- Data Visualization transforms raw data into visuals that support decision making  
- Charts play a core role in summarizing data patterns, comparisons, and relationships  
- Supported data types include Numerical, Categorical, Ordinal, Temporal, and Boolean  
- Common chart types include Bar, Line, Pie, Scatter, Histogram, Box, Heatmap, Treemap, and others  
- Each chart has specific use cases depending on the data and analysis goal  

In short  
Charts are the language of data as they turn numbers into stories, making analysis faster, clearer, and more actionable


In [1]:
import pandas as pd

# Load the dataset
file_path = '/Users/nishkarsh/Desktop/Infosys Internship/Week 2 /Updated_Dataset.csv'
df = pd.read_csv(file_path)
df.head(10)


Unnamed: 0,Age,Age_Group,Gender,Avg_Daily_Screen_Time_hr,awareness,Primary_Device,Device_Category,Screen_Size,Exceeded_Recommended_Limit,Educational_to_Recreational_Ratio,Health_Impacts,Health_Impact_Category,Urban_or_Rural
0,14,Teenagers,Male,3.99,Need Attention,Smartphone,Portable,<30,True,0.42,"Poor Sleep, Eye Strain",Both Physical and Mental,Urban
1,11,Pre-teens,Female,4.61,Need Attention,Laptop,Portable,<30,True,0.3,Poor Sleep,Mental,Urban
2,18,Late teens,Female,3.73,Need Attention,TV,Wallmounted,>30,True,0.32,Poor Sleep,Mental,Urban
3,15,Teenagers,Female,1.21,No harm,Laptop,Portable,<30,False,0.39,No health impacts,No Impact,Urban
4,12,Pre-teens,Female,5.89,Need Attention,Smartphone,Portable,<30,True,0.49,"Poor Sleep, Anxiety",Mental,Urban
5,14,Teenagers,Female,4.88,Need Attention,Smartphone,Portable,<30,True,0.44,Poor Sleep,Mental,Urban
6,17,Late teens,Male,2.97,No harm,TV,Wallmounted,>30,False,0.48,No health impacts,No Impact,Rural
7,10,Pre-teens,Male,2.74,No harm,TV,Wallmounted,>30,True,0.54,No health impacts,No Impact,Urban
8,14,Teenagers,Male,4.61,Need Attention,Laptop,Portable,<30,True,0.36,"Poor Sleep, Anxiety",Mental,Rural
9,18,Late teens,Male,3.24,Need Attention,Tablet,Portable,<30,True,0.48,"Poor Sleep, Obesity Risk",Both Physical and Mental,Urban


# Data Visualization Summary and Chart Mapping

## Dataset Overview

| Column Name | Data Type | Unique Values | Description (inferred) |
|--------------|------------|----------------|-------------------------|
| Age | int64 | 11 | Numerical age of participants |
| Age_Group | object | 3 | Categorical age brackets (e.g., Teen, Adult, Senior) |
| Gender | object | 2 | Categorical (Male/Female) |
| Avg_Daily_Screen_Time_hr | float64 | 899 | Numerical (continuous) – average screen time per day |
| awareness | object | 5 | Categorical (awareness levels or responses) |
| Primary_Device | object | 4 | Categorical (Mobile, Tablet, Laptop, etc.) |
| Device_Category | object | 2 | Categorical (e.g., Personal, Shared) |
| Screen_Size | object | 2 | Categorical (e.g., Small, Large) |
| Exceeded_Recommended_Limit | bool | 2 | Boolean – whether user exceeded screen limit |
| Educational_to_Recreational_Ratio | float64 | 31 | Numerical ratio (continuous) |
| Health_Impacts | object | 16 | Categorical (specific health concerns) |
| Health_Impact_Category | object | 4 | Categorical grouping of impacts |
| Urban_or_Rural | object | 2 | Categorical (living area type) |

---

## Data Type Summary

- Numerical columns: Age, Avg_Daily_Screen_Time_hr, Educational_to_Recreational_Ratio  
- Categorical columns: Age_Group, Gender, awareness, Primary_Device, Device_Category, Screen_Size, Health_Impacts, Health_Impact_Category, Urban_or_Rural  
- Boolean column: Exceeded_Recommended_Limit  

---


## 🧩 1. Univariate (Single Column) Analysis

| Column | Type | Recommended Charts | Purpose |
|---------|------|--------------------|----------|
| **Age** | Numerical | Histogram, Box Plot, KDE Density | Understand age distribution, outliers, skewness |
| **Age_Group** | Categorical (ordinal) | Bar Chart, Count Plot, Pareto Chart | Frequency of each age category |
| **Gender** | Categorical (nominal) | Bar Chart, Pie Chart | Gender distribution |
| **Avg_Daily_Screen_Time_hr** | Numerical (continuous) | Histogram, Box Plot, Density Plot | Screen time variation across respondents |
| **awareness** | Categorical (binary/nominal) | Bar Chart, Pie Chart | Awareness levels (Yes/No/Partial) |
| **Primary_Device** | Categorical | Bar Chart, Treemap, Donut Chart | Most commonly used devices |
| **Device_Category** | Categorical | Bar Chart, Treemap | Compare usage by category (mobile, tablet, PC, etc.) |
| **Screen_Size** | Numerical | Histogram, Box Plot | Common screen sizes, detect extreme values |
| **Exceeded_Recommended_Limit** | Binary | Bar Chart, Pie Chart | Proportion of users exceeding limits |
| **Educational_to_Recreational_Ratio** | Numerical (ratio) | Histogram, Box Plot, Violin Plot | How balanced screen time usage is |
| **Health_Impacts** | Categorical | Bar Chart, Donut Chart | Frequency of specific health impacts (e.g., eye strain) |
| **Health_Impact_Category** | Ordinal Categorical | Bar Chart, Pareto Chart | Overall severity distribution |
| **Urban_or_Rural** | Categorical | Bar Chart, Pie Chart | Respondent location split |

---

## 🔗 2. Bivariate (Two Columns at a Time)

### A. Numerical vs Numerical

| Pair | Recommended Charts | Insight |
|------|--------------------|----------|
| **Age vs Avg_Daily_Screen_Time_hr** | Scatter Plot, Regression Plot, Hexbin | Check correlation between age and screen time |
| **Screen_Size vs Avg_Daily_Screen_Time_hr** | Scatter Plot, Bubble Plot | Does larger screen size relate to more usage? |
| **Educational_to_Recreational_Ratio vs Avg_Daily_Screen_Time_hr** | Scatter / Line | Relationship between usage balance and total time |

---

### B. Categorical vs Numerical

| Pair | Recommended Charts | Insight |
|------|--------------------|----------|
| **Age_Group vs Avg_Daily_Screen_Time_hr** | Box / Violin / Point Plot | Compare screen time across age groups |
| **Gender vs Avg_Daily_Screen_Time_hr** | Box / Violin | Compare screen habits by gender |
| **Primary_Device vs Avg_Daily_Screen_Time_hr** | Box / Bar | Which device users spend most time on |
| **Device_Category vs Avg_Daily_Screen_Time_hr** | Bar / Box | Same category-level comparison |
| **Exceeded_Recommended_Limit vs Avg_Daily_Screen_Time_hr** | Box / Bar | Difference between over-limit and under-limit groups |
| **Urban_or_Rural vs Avg_Daily_Screen_Time_hr** | Box / Violin | Urban vs rural screen time contrast |
| **Health_Impact_Category vs Avg_Daily_Screen_Time_hr** | Box / Point Plot | How screen time changes with reported health severity |

---

### C. Categorical vs Categorical

| Pair | Recommended Charts | Insight |
|------|--------------------|----------|
| **Gender vs Awareness** | Grouped/Stacked Bar, Mosaic | Awareness gap between genders |
| **Age_Group vs Awareness** | 100% Stacked Bar | Which age groups are more aware |
| **Urban_or_Rural vs Awareness** | Grouped Bar | Location-based awareness difference |
| **Primary_Device vs Device_Category** | Grouped Bar, Treemap | Mapping devices to category types |
| **Age_Group vs Device_Category** | Mosaic / Heatmap (counts) | Device preference by age segment |
| **Gender vs Exceeded_Recommended_Limit** | Stacked Bar | Who exceeds limits more often |
| **Health_Impact_Category vs Exceeded_Recommended_Limit** | Heatmap, Stacked Bar | Severity relationship with overuse |
| **Urban_or_Rural vs Device_Category** | Grouped Bar | Device adoption patterns by region |

---

## 🧮 3. Multivariate (3 or More Columns)

| Combination | Recommended Visual | Purpose |
|--------------|--------------------|----------|
| **Age_Group + Gender + Avg_Daily_Screen_Time_hr** | Faceted Box Plots or Grouped Bar Chart | Compare screen time distributions by gender within age groups |
| **Device_Category + Awareness + Exceeded_Recommended_Limit** | Stacked Bar / Treemap | Identify which device categories drive overuse or low awareness |
| **Educational_to_Recreational_Ratio + Health_Impact_Category + Awareness** | Bubble Plot or Colored Scatter | See if balance in usage relates to health and awareness |
| **Age_Group + Urban_or_Rural + Awareness** | Clustered Bar / Mosaic | Awareness pattern across age & region combinations |
| **Primary_Device + Avg_Daily_Screen_Time_hr + Health_Impact_Category** | Heatmap or 3D Scatter | Link device type, usage, and health impact |
| **Age_Group + Gender + Exceeded_Recommended_Limit + Health_Impact_Category** | Multi-layer Stacked Bar / Dashboard Panel | Comprehensive picture of who is most at risk of overuse and health issues |

---

