# Data Analysis Core Concepts

### 1.What is Data

Data is a collection of facts, observations, or measurements that can be recorded and used for analysis.

It can be numbers, text, images, videos, dates, or any information that helps us understand something.

#### Examples of Data
Type	: Example
Numeric	: Salary = 45,000
Text	: Name = "Rahul"
Date	: "2025-12-10"
Boolean	: IsActive = True
Image	: Photo of employee

#### Key Points

Data is raw information.

Data alone has no meaning until we analyze it.

After processing, data becomes information, which helps in decision-making.

#### Simple Difference:
Term	: Meaning
Data	: Raw facts (e.g., 80, 90, 70)
Information	Processed data (e.g., Average = 80)

### 2. What is DataSet?

A Dataset is a collection of related data arranged in a structured form, usually in rows and columns like a table.

Think of it as a group of multiple data points stored together for analysis.

#### Example of a Dataset (Table Form)
EmpID	: EmpName	: Salary	: City
101	    : Amit	    : 45000	    : Pune
102	    : Riya	    : 52000	    : Mumbai
103	    : John	    : 60000	    : Delhi

This whole table = Dataset

Each:

Row → record/data point

Column → attribute/feature

#### Real-life examples of datasets

1. Excel sheet

2. CSV file

3. Database table

4. Survey responses

5. Sales records

6. Employee attendance sheet

7. Machine-learning training data

#### Simple definition:

Dataset = Structured collection of data used for analysis.

In [6]:
import pandas as pd

In [12]:
data = pd.read_csv(r"C:\Users\Prafull Wahatule\Desktop\Python_DA\Data\sales_data_50rows.csv")
data.head(5)

Unnamed: 0,Date,Product,Quantity,Price,Total,Region
0,2025-12-09,Grapes,14,394,5516,West
1,2025-12-08,Mango,9,376,3384,South
2,2025-12-07,Orange,26,58,1508,East
3,2025-12-06,Mango,2,393,786,North
4,2025-12-05,Mango,20,178,3560,North


### 3. Waht are cases in DataSet?

Cases are the individual records or rows in a dataset.

Each case represents one observation, one item, or one instance of data.

#### Example Dataset
data = pd.read_csv(r"C:\Users\Prafull Wahatule\Desktop\Python_DA\Data\sales_data_50rows.csv")

Here:

Case 1 → Row 1 (data about Amit)

Case 2 → Row 2 (data about Riya)

Case 3 → Row 3 (data about John)

So this dataset has 50 cases.


#### Simple Definition:

A case is one row of a dataset that contains information about one object/person/event.

#### Other Names for “Case”

Cases are also known as:

Rows

Records

Observations

Instances

Examples (in machine learning)

In [13]:
data.shape

(50, 6)

In [14]:
data.columns

Index(['Date', 'Product', 'Quantity', 'Price', 'Total', 'Region'], dtype='object')

### 4.What are Variables in DataSet?

Variables are the columns of a dataset.
Each column represents one type of information that is collected for every case (row).

A variable describes a particular property, characteristic, or attribute of the data.

#### Example

If your dataset has columns:

['Date', 'Product', 'Quantity', 'Price', 'Total', 'Region']


Then each of these is a variable.

1. Date → When the sale happened

2. Product → What item was sold

3. Quantity → How many units

4. Price → Cost per unit

5. Total → Quantity × Price

6. Region → Where the sale occurred

#### Simple Definition:

A variable = a column that stores the same type of information for all rows.

### 5.Type of Variables / Data Types

Variables (columns) in a dataset can be of different types based on the kind of data they store.
They are mainly classified into two big groups:

#### A. Categorical Variables (Qualitative Data)

These represent categories, labels, or names.
They do not have numerical meaning.

##### Types of Categorical Data:

1. Nominal -
No order -
Example: City, Product, Gender

2. Ordinal -
Has order or ranking -
Example: Low/Medium/High, Ratings (1–5), Grade (A, B, C)

#### B. Numerical Variables (Quantitative Data)

These represent numbers and can be measured.

##### Types of Numerical Data:

1. Discrete - 
Whole numbers -
Example: Quantity, Number of items, Count of employees

2. Continuous - 
Decimal values allowed - 
Example: Price, Height, Weight, Temperature


### 6.Why Data Types Are Important?

Data types are important because they decide how data is stored, processed, and analyzed in a dataset.
If the data type is wrong, your calculations, charts, filters, and analysis will also be wrong.

#### Reasons Why Data Types Matter
1️. Correct Calculations

If a numeric column is stored as text, mathematical operations will fail.
Example:

"100" + "50" → '10050' (wrong)

100 + 50 → 150 (correct)

2️. Efficient Data Storage

Different data types use different memory sizes.
Correct data types help the program run faster and save memory.

3️. Accurate Analysis

Statistical functions (mean, sum, median, correlation) only work on numerical data.
Text stored wrongly as numbers will break analysis.

4️. Proper Sorting and Filtering

Dates must be datetime

Prices must be numeric

Categories must be categorical

Otherwise:

Sorting becomes incorrect

Filters do not work properly

5️. Avoids Errors in Models / Dashboards

Wrong data type = wrong results
Example: Power BI / Excel dashboards behave incorrectly if date is treated as text.

6️. Helps Tools Understand the Data

Machine Learning, Pandas, SQL, Excel — all depend on correct types:
"
Categorical → used for grouping

Numerical → used for calculations

Datetime → used for time-series analysis

#### Simple Summary 

Correct data type = correct analysis.
Wrong data type = wrong results.

In [16]:
NumericColumn = data.select_dtypes(include=["number"])
CategoricalColumn = data.select_dtypes(include=["object"])
BoolColumn = data.select_dtypes(include=["bool"])

print(len(NumericColumn.columns))
print(len(CategoricalColumn.columns))
print(len(BoolColumn.columns))

3
3
0


### 7.How Do you collect information for diffrent Data Types?