
## Types of Variables:
###  Categorical (Qualitative) Variables:

Nominal Variables: Categories with no inherent order or ranking (e.g., colors, gender).
Ordinal Variables: Categories with a meaningful order or ranking (e.g., education levels, survey ratings).
Numerical (Quantitative) Variables:

### Discrete Variables: Countable and have distinct values (e.g., number of students in a class).

Continuous Variables: Can take any value within a range (e.g., height, weight).

### 2. Dependent and Independent Variables:
Dependent Variable (Response Variable): The variable being studied and measured. It depends on other variables.
Independent Variable (Explanatory Variable): The variable that is manipulated or controlled. It is used to predict or explain changes in the dependent variable.

### 3. Role of Variables in Hypothesis Testing:
In hypothesis testing, the dependent variable is often the one researchers are interested in studying, while the independent variable is manipulated to observe its effect.

### 4. Role of Variables in Regression Analysis:
In regression analysis, variables are used to model relationships between different factors. For example, predicting house prices based on various features (independent variables) like square footage, number of bedrooms, etc.

### 5. Data Types:
Qualitative Data: Describes qualities or characteristics and can be categorical.
Quantitative Data: Consists of numerical values and can be discrete or continuous.

### 6. Variable Distributions:
Normal Distribution: A symmetric distribution with a bell-shaped curve, often observed in many natural phenomena.
Skewed Distribution: A distribution that is not symmetrical, with more values on one side than the other.

In [2]:
# A nominal variable is a categorical variable without any inherent order or ranking.

# Nominal variable representing colors of cars
car_colors = ['Red', 'Blue', 'Green', 'Yellow', 'Black', 'White']

# Print the nominal variable
print("Colors of Cars:", car_colors)


Colors of Cars: ['Red', 'Blue', 'Green', 'Yellow', 'Black', 'White']


In [3]:
# An ordinal variable is a categorical variable with a meaningful order or ranking.

# Ordinal variable representing educational levels
education_levels = ['High School', 'Associate', 'Bachelor', 'Master', 'PhD']

# Print the ordinal variable
print("Educational Levels:", education_levels)


Educational Levels: ['High School', 'Associate', 'Bachelor', 'Master', 'PhD']


In [4]:
# An interval variable is a numerical variable where the differences between values are meaningful, but there is no true zero point.

# Interval variable representing temperatures in Celsius
temperature_celsius = [20, 25, 30, 15, 10]

# Print the interval variable
print("Temperatures in Celsius:", temperature_celsius)


Temperatures in Celsius: [20, 25, 30, 15, 10]


In [5]:
# An interval variable is a numerical variable where the differences between values are meaningful, but there is no true zero point.

# Interval variable representing temperatures in Celsius
temperature_celsius = [20, 25, 30, 15, 10]

# Print the interval variable
print("Temperatures in Celsius:", temperature_celsius)


Temperatures in Celsius: [20, 25, 30, 15, 10]


### frequency distribution

In [6]:
# Exam scores dataset
exam_scores = [85, 92, 88, 78, 95, 88, 90, 92, 85, 88]

# Calculate frequency distribution
frequency_dist = {}
for score in exam_scores:
    if score in frequency_dist:
        frequency_dist[score] += 1
    else:
        frequency_dist[score] = 1

# Print the frequency distribution
print("Frequency Distribution:")
for score, count in frequency_dist.items():
    print(f"Score {score}: {count} times")


Frequency Distribution:
Score 85: 2 times
Score 92: 2 times
Score 88: 3 times
Score 78: 1 times
Score 95: 1 times
Score 90: 1 times


### cumulative frequency

In [7]:
# Calculate cumulative frequency
cumulative_frequency = {}
total = 0
for score, count in sorted(frequency_dist.items()):
    total += count
    cumulative_frequency[score] = total

# Print the cumulative frequency
print("\nCumulative Frequency:")
for score, cum_freq in cumulative_frequency.items():
    print(f"Score {score}: Cumulative Frequency {cum_freq}")



Cumulative Frequency:
Score 78: Cumulative Frequency 1
Score 85: Cumulative Frequency 3
Score 88: Cumulative Frequency 6
Score 90: Cumulative Frequency 7
Score 92: Cumulative Frequency 9
Score 95: Cumulative Frequency 10


In [8]:
# Calculate the percentage of a value in a dataset
total_students = 50
passing_students = 35

passing_percentage = (passing_students / total_students) * 100
print(f"Passing Percentage: {passing_percentage}%")


Passing Percentage: 70.0%


## Percentile:


#### A percentile is a measure used in statistics to indicate the relative standing of a particular value within a dataset. It represents the percentage of data points below a certain value.

In [9]:
import numpy as np

# Calculate the 75th percentile (Q3) of a dataset
data = [10, 15, 20, 25, 30, 35, 40, 45, 50]
percentile_75 = np.percentile(data, 75)
print(f"75th Percentile (Q3): {percentile_75}")


75th Percentile (Q3): 40.0


## Quartiles:

#### Quartiles divide a dataset into four equal parts. The three quartiles, Q1, Q2 (median), and Q3, represent the 25th, 50th, and 75th percentiles, respectively.

In [10]:
import numpy as np

# Calculate quartiles of a dataset
data = [12, 18, 20, 24, 28, 32, 36, 40, 44]
q1 = np.percentile(data, 25)
median = np.percentile(data, 50)
q3 = np.percentile(data, 75)

print(f"Q1: {q1}, Median: {median}, Q3: {q3}")


Q1: 20.0, Median: 28.0, Q3: 36.0


## Five-Number Summary in Statistics:



##### The five-number summary is a descriptive statistic that provides a concise summary of the distribution of a dataset. It consists of five key values: the minimum, the first quartile (Q1), the median (Q2), the third quartile (Q3), and the maximum. These values help understand the central tendency and spread of the data.

### but first we have to remove outier 

# interquartile range (IQR)  = Q3 - Q1  

# Lowerfance =   Q1 - 1.5(IQR)   

# higherfance =   Q3 + 1.5(IQR)   

# Lf <------> HF   , remove outlier 

In [11]:
import numpy as np

# Calculate the five-number summary of a dataset
data = [15, 20, 22, 25, 28, 32, 36, 40, 45]

minimum = np.min(data)
q1 = np.percentile(data, 25)
median = np.median(data)
q3 = np.percentile(data, 75)
maximum = np.max(data)

print(f"Minimum: {minimum}, Q1: {q1}, Median: {median}, Q3: {q3}, Maximum: {maximum}")


Minimum: 15, Q1: 22.0, Median: 28.0, Q3: 36.0, Maximum: 45
