# **Pandas Day 2**

---

## **1 : Different Names of Columns and Rows :-**

### **Different Names of Rows :**

- Records
- Obervations
- Instances
- Entities
- Data Points

### **Different Names of Columns :**

- Attributes
- Dimensions
- Variables
- Properties
- Fields
- Features

---

## **2 : Data Life Cycle :-**

Acquire -> Clean -> Use -> Publish -> Reserve/Save

The data life cycle defines how data is handled from the moment it is collected until it is archived or deleted. Here are the 5 key steps:

### **1. Acquire :**

- Data is collected from various sources such as sensors, databases, surveys, APIs, or manual entries.
- Focus is on gathering **raw data** that will later be refined.
- Example: Collecting weather data from sensors or user feedback from a web form.

### **2. Clean :**

- The raw data is **checked and corrected** for errors, duplicates, or inconsistencies.
- This step ensures that the data is accurate and reliable for analysis.
- Example: Removing missing values, correcting typos, or filtering out irrelevant data.

### **3. Use :**

- The cleaned data is analyzed and used to **generate insights**, make decisions, or train models.
- This is the most valuable phase where data becomes useful.
- Example: Using customer data to personalize recommendations or predict future trends.

### **4. Publish :**

- Processed and analyzed data is shared with stakeholders or made public.
- Can be published in the form of **reports, dashboards, APIs, or datasets**.
- Example: Publishing COVID-19 statistics on a government website.

### **5. Reserve / Save :**

- Final stage where the data is **archived or stored** for future reference or legal purposes.
- Proper storage ensures **security, privacy, and backup**.
- Example: Saving data in cloud storage, data warehouses, or backup drives.

---

## **3 : Types of Data in Data Science :-**

Understanding different types of data is essential for collecting, storing, and analyzing information effectively. Below are the four major categories of data:

### **1. Structured Data :**

- **Definition:** Data that is organized in a fixed format such as rows and columns (like in databases or spreadsheets).
- **Storage Format:** Tables, SQL databases, Excel files.
- **Easy to Search & Analyze** using tools like SQL or pandas.
- **Examples:**
  - Student records (Name, Roll No, Marks)
  - Sales data in Excel sheets
  - Bank transaction logs

### **2. Unstructured Data :**

- **Definition:** Data that does not follow a predefined model or structure.
- **Storage Format:** Text files, multimedia files, web pages.
- **Hard to Process** without preprocessing or AI tools.
- **Examples:**
  - Emails
  - Social media posts
  - Images, videos, audio files
  - Chat logs

### **3. Primary Data :**

- **Definition:** Data collected **directly by the researcher** for a specific purpose.
- **Usually Original, Raw, and First-hand.**
- **Collection Methods:** Surveys, interviews, experiments, observations.
- **Examples:**
  - Feedback collected through a Google Form
  - Data from a controlled experiment in a lab

### **4. Secondary Data :**

- **Definition:** Data that was **collected by someone else** for another purpose, but is reused for current research.
- **Already Available and Pre-processed.**
- **Sources:** Research papers, government reports, datasets from websites.
- **Examples:**
  - Population data from government census
  - Datasets from Kaggle
  - Articles, books, and statistical yearbooks
  



---

## **4 : WH Questions for Data Collection :-**

WH questions help us collect the **right data** by focusing on key aspects such as purpose, source, time, and method. These questions guide researchers in planning and organizing their data collection process effectively.

### **1. What? :**

- **What type of data is needed?**
- **What is the goal or problem you are trying to solve?**
- Helps define the **scope** and **content** of the data.
- ✅ Example: What information do we need to predict sales?

### **2. Who? :**

- **Who is the source of the data?**
- **Who will use this data?**
- Focuses on **data origin** and **target audience**.
- ✅ Example: Who are the respondents of the survey? Who will analyze the data?

### **3. Where? :**

- **Where will the data be collected from?**
- Location or platform for data collection (online, field, database).
- ✅ Example: Where can we find customer reviews? Where is the database stored?

### **4. When? :**

- **When should the data be collected?**
- Helps set a **timeline** or decide **frequency** (real-time, monthly, yearly).
- ✅ Example: When was the data last updated? When is the best time to conduct the survey?

### **5. How? :**

- **How will the data be collected?**
- Defines the **methodology**: surveys, APIs, sensors, web scraping, etc.
- ✅ Example: How are we going to gather the data — manually or automatically?

### **6. Why? :**

- **Why is this data being collected?**
- Clarifies the **purpose** and ensures that no irrelevant data is collected.
- ✅ Example: Why do we need user location data?

### **Summary Table :**

| Question | Purpose |
|----------|---------|
| What     | Define the type and content of data |
| Who      | Identify sources and users of data |
| Where    | Know the location/platform of data |
| When     | Decide timing and frequency |
| How      | Choose data collection method |
| Why      | Clarify the reason and relevance |


--- 

## **4 : Difference Between Data Scientist, Data Analyst, and AI Engineer :-**

In the world of data and AI, different roles have different responsibilities. Here’s a simple explanation and comparison of the three key roles:


### **1. Data Analyst :**

- **Main Focus:** Understanding and visualizing past data to help in decision making.
- **Responsibilities:**
  - Clean and analyze data
  - Create charts, graphs, and dashboards
  - Find patterns and trends
- **Tools:** Excel, SQL, Tableau, Power BI, Python (pandas)

✅ **Example:** Analyzing sales data to find which product sells best in each region.

### **2. Data Scientist :**

- **Main Focus:** Making predictions and building models using advanced statistics and machine learning.
- **Responsibilities:**
  - Handle large and complex datasets
  - Build predictive models
  - Use statistics and algorithms to gain deep insights
- **Tools:** Python, R, SQL, Scikit-learn, TensorFlow, Jupyter Notebook

✅ **Example:** Predicting customer churn using machine learning models.

### **3. AI Engineer :**

- **Main Focus:** Designing and developing intelligent systems that mimic human behavior.
- **Responsibilities:**
  - Build AI-powered applications (like chatbots, image recognition, etc.)
  - Work on neural networks, deep learning, and NLP
  - Deploy models in production
- **Tools:** Python, TensorFlow, PyTorch, OpenCV, APIs

✅ **Example:** Creating a voice assistant that understands and responds to user commands.

### **Comparison Table :**

| Feature / Role        | Data Analyst                  | Data Scientist                    | AI Engineer                         |
|-----------------------|-------------------------------|------------------------------------|--------------------------------------|
| 📌 Goal               | Understand & explain data     | Predict & model data               | Build intelligent systems            |
| 🔧 Tools              | Excel, SQL, Tableau           | Python, R, ML libraries            | TensorFlow, PyTorch, OpenCV          |
| 📊 Work Type          | Reporting & visualization     | Modeling & predictions             | AI app development                   |
| 📈 Skills Needed      | Statistics, visualization     | Programming, ML, data wrangling    | Deep learning, deployment, ML        |
| 🧠 Knowledge Level    | Basic to Intermediate         | Advanced statistics & ML           | Expert in ML and AI frameworks       |
| 🧪 Example Task       | Analyze sales report          | Predict customer behavior          | Build a self-learning recommendation system |


---

## **5 : Levels of Measurement (Scales of Data) :-**

In statistics and data science, data can be classified into four levels of measurement. Each level tells us how data can be **measured**, **compared**, and **analyzed**.

### **1. Nominal Scale (Name Only) :**

- **Definition:** Categories with **no order** or ranking.
- **Characteristics:** Only labels or names; we can only **count** them.
- **Mathematical Meaning:** No mathematical operations.
- ✅ **Examples:**
  - Gender: Male, Female
  - Blood Type: A, B, AB, O
  - Country Names

### **2. Ordinal Scale (Order Matters) :**

- **Definition:** Categories with a **meaningful order**, but no fixed gap between values.
- **Characteristics:** Can **rank** the data, but can’t measure exact difference.
- ✅ **Examples:**
  - Education Level: High School < Bachelor's < Master's < PhD
  - Customer Satisfaction: Poor, Average, Good, Excellent
  - Class Position: 1st, 2nd, 3rd

### **3. Interval Scale (Equal Intervals) :**

- **Definition:** Ordered scale with **equal intervals** between values, but **no true zero**.
- **Characteristics:** Can add or subtract values, but ratios don’t make sense.
- ✅ **Examples:**
  - Temperature (°C or °F)
  - Calendar years (e.g., 1990, 2000)
  - IQ Scores

### **4. Ratio Scale (True Zero Present) :**

- **Definition:** Same as interval scale, but includes a **true zero** point.
- **Characteristics:** All mathematical operations possible: add, subtract, multiply, divide.
- ✅ **Examples:**
  - Height, Weight, Age
  - Income, Distance, Time
  - Marks obtained in a test (0 means none)


---

## **6 : Division of Data (Types and Levels) :**

In statistics, data is divided mainly into **two types**:  
1. Qualitative (Categorical)  
2. Quantitative (Numerical)

Each type has further **subtypes** based on the nature and behavior of data.

### **1. Qualitative Data (Categorical Data) :**

- **Definition:** Data that describes qualities or categories.
- **Not measurable in numbers** (but can be counted).
- Used for **classification or labeling**.

#### **Subtypes of Qualitative Data :**

##### **i). Nominal Scale :**
- Categories with **no order or ranking**.
- ✅ Examples: Gender (Male, Female), Blood Type (A, B, AB)

##### **ii). Ordinal Scale :**
- Categories with a **meaningful order**, but **unequal gaps**.
- ✅ Examples: Rank (1st, 2nd, 3rd), Satisfaction (Good, Better, Best)

### **2. Quantitative Data (Numerical Data) :**

- **Definition:** Data that is measurable and expressed in numbers.
- Used for **calculations and analysis**.

#### **Subtypes of Quantitative Data :**

##### **i). Discrete Data :**
- **Whole numbers only** (no fractions/decimals).
- Countable values.
- ✅ Examples: Number of students, Cars in parking

##### **ii). Continuous Data :**
- Can take **any value** within a range (including decimals).
- Measurable with instruments.

##### **Types of Continuous Data :**

##### **a). Interval Scale :**
- Equal intervals, **no true zero**.
- ✅ Examples: Temperature (°C, °F), IQ Scores

##### **b). Ratio Scale :**
- Equal intervals, **true zero exists**.
- ✅ Examples: Height, Weight, Age, Distance



--- 

## **7 : Types of Data Analysis :**

Data analysis helps us understand, explain, and make decisions from data. There are **4 main types** of data analysis, each with a different goal:

### **1. Descriptive Analysis :**

- **Purpose:** Answer the question **"What happened?"**
- Summarizes raw data into simple summaries and visualizations.
- Shows **trends, patterns, and averages**.
- ✅ **Examples:**
  - Average monthly sales
  - Number of website visitors per day
  - Pie charts and bar graphs

### **2. Diagnostic Analysis :**

- **Purpose:** Answer the question **"Why did it happen?"**
- Goes deeper to find **causes** or reasons behind trends.
- Uses comparisons, correlations, and drill-downs.
- ✅ **Examples:**
  - Why did sales drop in March?
  - Why are users leaving the website?

### **3. Predictive Analysis :**

- **Purpose:** Answer the question **"What is likely to happen?"**
- Uses **historical data + statistical models + machine learning** to make future predictions.
- ✅ **Examples:**
  - Predicting customer churn
  - Forecasting next month’s revenue

### **4. Prescriptive Analysis :**

- **Purpose:** Answer the question **"What should we do?"**
- Suggests the **best course of action** using AI, optimization, and simulations.
- Combines insights from predictive + diagnostic analysis.
- ✅ **Examples:**
  - Recommending best marketing strategy
  - Optimizing delivery routes

