# Task 12: Introduction to Pandas (Series, DataFrame basics)

# Introduction to Pandas

Pandas is a powerful and flexible open-source data analysis and manipulation library for Python. It provides data structures like Series and DataFrame, which make it easy to handle structured data, perform operations, and visualize data efficiently.

## Key Data Structures
### Series
A Series is a one-dimensional labeled array capable of holding any data type. It is similar to a column in a table or a list in Python but with additional features.
### DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a table in a database or a spreadsheet in Excel.

In [3]:
import pandas as pd
import numpy as np

#### Create a Pandas Series from a Python list, numpy array, and a dictionary.

In [6]:
python_list = [10, 20, 30, 40, 50]

series_from_list = pd.Series(python_list)
print("Series from Python list:")
print(series_from_list)

numpy_array = np.array([10, 20, 30, 40, 50])

series_from_array = pd.Series(numpy_array)
print("\nSeries from NumPy array:")
print(series_from_array)

data_dict = {'A': 10, 'B': 20, 'C': 30, 'D': 40, 'E': 50}

series_from_dict = pd.Series(data_dict)
print("\nSeries from dictionary with custom index:")
print(series_from_dict)

Series from Python list:
0    10
1    20
2    30
3    40
4    50
dtype: int64

Series from NumPy array:
0    10
1    20
2    30
3    40
4    50
dtype: int32

Series from dictionary with custom index:
A    10
B    20
C    30
D    40
E    50
dtype: int64


#### Assign a custom index to the Series.

In [7]:
data = [10, 20, 30, 40, 50]

custom_index = ['A', 'B', 'C', 'D', 'E']

series_with_custom_index = pd.Series(data, index=custom_index)
print("Series with custom index:")
print(series_with_custom_index)

Series with custom index:
A    10
B    20
C    30
D    40
E    50
dtype: int64


#### Perform basic arithmetic operations on Series.

In [8]:
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([10, 20, 30, 40])

addition = s1 + s2
print("\nAddition of two Series:")
print(addition)

subtraction = s2 - s1
print("\nSubtraction of two Series:")
print(subtraction)

multiplication = s1 * s2
print("\nMultiplication of two Series:")
print(multiplication)

division = s2 / s1
print("\nDivision of two Series:")
print(division)


Addition of two Series:
0    11
1    22
2    33
3    44
dtype: int64

Subtraction of two Series:
0     9
1    18
2    27
3    36
dtype: int64

Multiplication of two Series:
0     10
1     40
2     90
3    160
dtype: int64

Division of two Series:
0    10.0
1    10.0
2    10.0
3    10.0
dtype: float64


#### Access elements using index labels and positions.

In [10]:
print("\nAccessing element at label 'B':", series_from_dict['B'])

print("Accessing element at position 2:", series_from_dict.iloc[2])



Accessing element at label 'B': 20
Accessing element at position 2: 30


#### Filter the Series to include only values greater than a specific threshold.

In [11]:
filtered_series = series_from_dict[series_from_dict > 30]
print("\nFiltered Series with values greater than 30:")
print(filtered_series)


Filtered Series with values greater than 30:
D    40
E    50
dtype: int64


#### Create a DataFrame from a dictionary of lists.

In [13]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df_from_dict = pd.DataFrame(data)
print("\nDataFrame from dictionary of lists:")
print(df_from_dict)


DataFrame from dictionary of lists:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


#### Create a DataFrame from a numpy array, specifying column and index names.

In [14]:
numpy_data = np.array([
    [10, 20, 30],
    [40, 50, 60],
    [70, 80, 90]
])

df_from_numpy = pd.DataFrame(numpy_data, columns=['A', 'B', 'C'], index=['X', 'Y', 'Z'])
print("\nDataFrame from NumPy array with column and index names:")
print(df_from_numpy)


DataFrame from NumPy array with column and index names:
    A   B   C
X  10  20  30
Y  40  50  60
Z  70  80  90


#### Load a DataFrame from a CSV file.

In [38]:
df = pd.read_csv("laptop_price_data.csv")
df

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ips,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os
0,Apple,Ultrabook,8,1.37,11.175755,0,1,226.983005,Intel Core i5,0,128,Intel,Mac
1,Apple,Ultrabook,8,1.34,10.776777,0,0,127.677940,Intel Core i5,0,0,Intel,Mac
2,HP,Notebook,8,1.86,10.329931,0,0,141.211998,Intel Core i5,0,256,Intel,Others
3,Apple,Ultrabook,16,1.83,11.814476,0,1,220.534624,Intel Core i7,0,512,AMD,Mac
4,Apple,Ultrabook,8,1.37,11.473101,0,1,226.983005,Intel Core i5,0,256,Intel,Mac
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1268,Asus,Notebook,4,2.20,10.555257,0,0,100.454670,Intel Core i7,500,0,Nvidia,Windows
1269,Lenovo,2 in 1 Convertible,4,1.80,10.433899,1,1,157.350512,Intel Core i7,0,128,Intel,Windows
1270,Lenovo,2 in 1 Convertible,16,1.30,11.288115,1,1,276.053530,Intel Core i7,0,512,Intel,Windows
1271,Lenovo,Notebook,2,1.50,9.409283,0,0,111.935204,Other Intel Processor,0,0,Intel,Windows


#### Display the first and last five rows of the DataFrame.

In [39]:
df.head()

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ips,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os
0,Apple,Ultrabook,8,1.37,11.175755,0,1,226.983005,Intel Core i5,0,128,Intel,Mac
1,Apple,Ultrabook,8,1.34,10.776777,0,0,127.67794,Intel Core i5,0,0,Intel,Mac
2,HP,Notebook,8,1.86,10.329931,0,0,141.211998,Intel Core i5,0,256,Intel,Others
3,Apple,Ultrabook,16,1.83,11.814476,0,1,220.534624,Intel Core i7,0,512,AMD,Mac
4,Apple,Ultrabook,8,1.37,11.473101,0,1,226.983005,Intel Core i5,0,256,Intel,Mac


In [40]:
df.tail()

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ips,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os
1268,Asus,Notebook,4,2.2,10.555257,0,0,100.45467,Intel Core i7,500,0,Nvidia,Windows
1269,Lenovo,2 in 1 Convertible,4,1.8,10.433899,1,1,157.350512,Intel Core i7,0,128,Intel,Windows
1270,Lenovo,2 in 1 Convertible,16,1.3,11.288115,1,1,276.05353,Intel Core i7,0,512,Intel,Windows
1271,Lenovo,Notebook,2,1.5,9.409283,0,0,111.935204,Other Intel Processor,0,0,Intel,Windows
1272,HP,Notebook,6,2.19,10.614129,0,0,100.45467,Intel Core i7,1000,0,AMD,Windows


#### Get a summary of the DataFrame including the mean, median, and standard deviation of numeric columns.

In [41]:
df.describe()

Unnamed: 0,Ram,Weight,Price,TouchScreen,Ips,Ppi,HDD,SSD
count,1273.0,1273.0,1273.0,1273.0,1273.0,1273.0,1273.0,1273.0
mean,8.447761,2.0411,10.828218,0.146897,0.279654,146.950812,413.715632,186.252946
std,5.098771,0.669241,0.619565,0.354142,0.449006,42.926775,518.054486,186.531571
min,2.0,0.69,9.134616,0.0,0.0,90.583402,0.0,0.0
25%,4.0,1.5,10.387379,0.0,0.0,127.335675,0.0,0.0
50%,8.0,2.04,10.872255,0.0,0.0,141.211998,0.0,256.0
75%,8.0,2.31,11.287447,0.0,1.0,157.350512,1000.0,256.0
max,64.0,4.7,12.691441,1.0,1.0,352.465147,2000.0,1024.0


In [42]:
Companies = df['Company']
print("\nSeries extracted from 'Company' column:")
print(Companies)


Series extracted from 'Company' column:
0        Apple
1        Apple
2           HP
3        Apple
4        Apple
         ...  
1268      Asus
1269    Lenovo
1270    Lenovo
1271    Lenovo
1272        HP
Name: Company, Length: 1273, dtype: object


#### Filter rows based on column values.

In [43]:
filtered_df = df[df['Company'] == 'Lenovo']
filtered_df

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ips,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os
18,Lenovo,Notebook,8,2.20,10.188167,0,0,141.211998,Intel Core i3,1000,0,Nvidia,Others
21,Lenovo,Gaming,8,2.50,10.882316,0,1,141.211998,Intel Core i5,1000,128,Nvidia,Windows
35,Lenovo,Notebook,4,1.44,9.493014,0,0,111.935204,Other Intel Processor,0,0,Intel,Windows
46,Lenovo,Notebook,4,2.20,9.886358,0,0,100.454670,Intel Core i3,0,128,Intel,Others
50,Lenovo,2 in 1 Convertible,4,0.69,9.740752,1,1,224.173809,Other Intel Processor,0,0,Intel,Others
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1259,Lenovo,2 in 1 Convertible,4,1.80,10.700607,1,0,157.350512,Intel Core i5,0,128,Intel,Windows
1264,Lenovo,Notebook,8,2.60,10.776844,0,1,141.211998,Intel Core i7,1000,0,Nvidia,Windows
1269,Lenovo,2 in 1 Convertible,4,1.80,10.433899,1,1,157.350512,Intel Core i7,0,128,Intel,Windows
1270,Lenovo,2 in 1 Convertible,16,1.30,11.288115,1,1,276.053530,Intel Core i7,0,512,Intel,Windows


In [44]:
average_price = df['Price'].mean()
filtered_df = df[df['Price'] > average_price]
filtered_df

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ips,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os
0,Apple,Ultrabook,8,1.37,11.175755,0,1,226.983005,Intel Core i5,0,128,Intel,Mac
3,Apple,Ultrabook,16,1.83,11.814476,0,1,220.534624,Intel Core i7,0,512,AMD,Mac
4,Apple,Ultrabook,8,1.37,11.473101,0,1,226.983005,Intel Core i5,0,256,Intel,Mac
6,Apple,Ultrabook,16,2.04,11.644108,0,1,220.534624,Intel Core i7,0,0,Intel,Mac
7,Apple,Ultrabook,8,1.34,11.030615,0,0,127.677940,Intel Core i5,0,0,Intel,Mac
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1248,Dell,2 in 1 Convertible,8,1.24,11.478299,1,0,276.053530,Intel Core i5,0,256,Intel,Windows
1252,Lenovo,Notebook,8,1.90,10.952842,0,1,157.350512,Intel Core i5,0,256,Intel,Windows
1255,Asus,Gaming,16,4.00,11.525170,0,1,127.335675,Intel Core i7,1000,128,Nvidia,Windows
1258,MSI,Gaming,8,2.40,11.089517,0,0,141.211998,Intel Core i7,1000,128,Nvidia,Windows


#### Select rows based on multiple conditions.

In [45]:
apple_laptops_with_high_ram = df[(df['Company'] == 'Apple') & (df['Ram'] > 8)]
apple_laptops_with_high_ram

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ips,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os
3,Apple,Ultrabook,16,1.83,11.814476,0,1,220.534624,Intel Core i7,0,512,AMD,Mac
6,Apple,Ultrabook,16,2.04,11.644108,0,1,220.534624,Intel Core i7,0,0,Intel,Mac
12,Apple,Ultrabook,16,1.83,11.775302,0,1,220.534624,Intel Core i7,0,256,AMD,Mac
17,Apple,Ultrabook,16,1.83,11.933438,0,1,220.534624,Intel Core i7,0,512,AMD,Mac


In [46]:
cheap_touchscreen_laptops = df[(df['Price'] < 10) & (df['TouchScreen'] == 1)]
cheap_touchscreen_laptops

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ips,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os
50,Lenovo,2 in 1 Convertible,4,0.69,9.740752,1,1,224.173809,Other Intel Processor,0,0,Intel,Others
314,Asus,2 in 1 Convertible,2,1.1,9.592332,1,0,135.094211,Other Intel Processor,0,0,Intel,Windows
348,Asus,2 in 1 Convertible,4,1.5,9.902487,1,0,135.094211,Other Intel Processor,0,0,Intel,Windows
429,Mediacom,2 in 1 Convertible,4,1.16,9.676005,1,1,189.905791,Other Intel Processor,0,32,Intel,Windows
560,Acer,2 in 1 Convertible,4,1.25,9.830633,1,1,189.905791,Other Intel Processor,0,0,Intel,Windows
619,Acer,2 in 1 Convertible,4,1.25,9.913097,1,1,135.094211,Other Intel Processor,0,0,Intel,Others
959,Acer,2 in 1 Convertible,4,1.25,9.93914,1,1,135.094211,Other Intel Processor,0,0,Intel,Others


#### Add a new column to the DataFrame.

In [47]:
df['Storage'] = df['HDD'] + df['SSD']
df.head()

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ips,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os,Storage
0,Apple,Ultrabook,8,1.37,11.175755,0,1,226.983005,Intel Core i5,0,128,Intel,Mac,128
1,Apple,Ultrabook,8,1.34,10.776777,0,0,127.67794,Intel Core i5,0,0,Intel,Mac,0
2,HP,Notebook,8,1.86,10.329931,0,0,141.211998,Intel Core i5,0,256,Intel,Others,256
3,Apple,Ultrabook,16,1.83,11.814476,0,1,220.534624,Intel Core i7,0,512,AMD,Mac,512
4,Apple,Ultrabook,8,1.37,11.473101,0,1,226.983005,Intel Core i5,0,256,Intel,Mac,256


#### Delete a column from the DataFrame.

In [48]:
df = df.drop(columns=['Ips'])
df.head()

Unnamed: 0,Company,TypeName,Ram,Weight,Price,TouchScreen,Ppi,Cpu_brand,HDD,SSD,Gpu_brand,Os,Storage
0,Apple,Ultrabook,8,1.37,11.175755,0,226.983005,Intel Core i5,0,128,Intel,Mac,128
1,Apple,Ultrabook,8,1.34,10.776777,0,127.67794,Intel Core i5,0,0,Intel,Mac,0
2,HP,Notebook,8,1.86,10.329931,0,141.211998,Intel Core i5,0,256,Intel,Others,256
3,Apple,Ultrabook,16,1.83,11.814476,0,220.534624,Intel Core i7,0,512,AMD,Mac,512
4,Apple,Ultrabook,8,1.37,11.473101,0,226.983005,Intel Core i5,0,256,Intel,Mac,256


#### Rename columns in the DataFrame.

In [52]:
df = df.rename(columns={'TypeName': 'Type', 'Cpu_brand': 'CPU'})
df.head()

Unnamed: 0,Company,Type,Ram,Weight,Price,TouchScreen,Ppi,CPU,HDD,SSD,Gpu_brand,Os,Storage
0,Apple,Ultrabook,8,1.37,11.175755,0,226.983005,Intel Core i5,0,128,Intel,Mac,128
1,Apple,Ultrabook,8,1.34,10.776777,0,127.67794,Intel Core i5,0,0,Intel,Mac,0
2,HP,Notebook,8,1.86,10.329931,0,141.211998,Intel Core i5,0,256,Intel,Others,256
3,Apple,Ultrabook,16,1.83,11.814476,0,220.534624,Intel Core i7,0,512,AMD,Mac,512
4,Apple,Ultrabook,8,1.37,11.473101,0,226.983005,Intel Core i5,0,256,Intel,Mac,256
