## Numpy Library

Numpy (short for Numerical Python) is a library used for working with arrays, and it also provides a collection of mathematical functions to operate on these arrays. It is the foundation of scientific computing in Python.

In [1]:
!pip install numpy



## Common Numpy Functions and Operations

## 1. Creating Numpy Arrays
You can create arrays from Python lists or use built-in functions like arange, zeros, ones, etc.

Example:

In [2]:
import numpy as np

# Creating a 1D array from a list
arr = np.array([1, 2, 3, 4, 5])
print("1D Array:")
print(arr)

# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:")
print(arr_2d)

# Creating a 3D array
arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("3D Array:")
print(arr_3d)

1D Array:
[1 2 3 4 5]
2D Array:
[[1 2 3]
 [4 5 6]]
3D Array:
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


## 2. Creating an array of numbers

In [6]:
ls = [1,1.0, "str", True]
print(ls)

[1, 1.0, 'str', True]


In [5]:
# Creating an array from a list
array_from_list = np.array([1, 2, 3, 4])
print("Array from List:")
print(array_from_list)

# Using arange to create a sequence
array_with_range = np.arange(0, 10, 2)  # Output: [0, 2, 4, 6, 8]
print("Array with arange:")
print(array_with_range)

# Creating an array of zeros
array_zeros = np.zeros((3, 3))  # 3x3 matrix of zeros
print("Array of Zeros:")
print(array_zeros)

# Creating an array of ones
array_ones = np.ones((3, 3))  # 2x2 matrix of ones
print("Array of Ones:")
print(array_ones)

Array from List:
[1 2 3 4]
Array with arange:
[0 2 4 6 8]
Array of Zeros:
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
Array of Ones:
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


## 3. Creating a linearly spaced vector, with spacing

In [None]:
import numpy as np
vector = np.linspace(0, 20, 5)
print(vector)

[ 0.  5. 10. 15. 20.]


## 4. Creating an array using existing data

In [None]:
import numpy
ls1 = [1,2,3]
a = numpy.asarray(ls1)
print(a)
print(type(a))

[1 2 3]
<class 'numpy.ndarray'>


## 2. Array Operations
Numpy supports element-wise operations, matrix operations, and broadcasting.

Example:

In [7]:
import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Element-wise addition
sum_arr = arr1 + arr2  # Output: [5, 7, 9]
print("Element-wise Addition")
print(sum_arr)

# Element-wise multiplication
mult_arr = arr1 * arr2  # Output: [4, 10, 18]
print("Element-wise Multiplication")
print(mult_arr)

# Multiplying all elements by a scalar
scalar_mult = arr1 * 3  # Output: [3, 6, 9]
print("Multiplying all elements by a scalar")
print(scalar_mult)

# Matrix multiplication
matrix_mult = np.dot(arr1, arr2)  # Output: 32 (1*4 + 2*5 + 3*6)
print("Matrix Multiplication")
print(matrix_mult)

Element-wise Addition
[5 7 9]
Element-wise Multiplication
[ 4 10 18]
Multiplying all elements by a scalar
[3 6 9]
Matrix Multiplication
32


## 3. Slicing and Indexing
You can slice and index Numpy arrays just like Python lists, but with more flexibility for multi-dimensional arrays.

In [None]:
import numpy as np

arr = np.array([10, 20, 30, 40, 50])
print(arr)
# Slicing elements from index 1 to 3
sliced_arr = arr[1:4]  # Output: [20, 30, 40]
sliced_arr

[10 20 30 40 50]


array([20, 30, 40])

## 4. Reshape or Restructuring a NumPy Array:

In [None]:
import numpy as np
# one dimensional or 1D array
a = np.arange(6)
print(a)
print()
#bidimensional or two dimensional array or 2D array
b = np.arange(12)
print(b.reshape(3,4))
print()
#three dimensional or 3D array
c = np.arange(24)
print(c.reshape(2,3,4))

[0 1 2 3 4 5]

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


In [None]:
import numpy as np
a = np.arange(15)# Create one-dimenstional array using arange() function
print(a)
#check the dimenstion of array
print(a.ndim)
#Reshape array
a= a.reshape(3,5)
print(a)
#check the shape of array (rows, columns)
print(a.shape)
#check the size of array (the total number of elements of the array)
print(a.size)
#check data type of array
print(a.dtype)
#check itemsize of array
print(a.itemsize)
#check type of array
print(type(a))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
1
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
(3, 5)
15
int64
8
<class 'numpy.ndarray'>


## Pandas: Data Analysis Library
Pandas is a high-level library built on top of Numpy that provides data structures and data analysis tools. It is designed for working with structured data like tables, spreadsheets, or SQL databases.

## 1. Pandas Series
A Pandas Series is a one-dimensional array that can hold data of any type (integers, strings, floats, etc.). Each element in the Series is associated with an index.

Creating a Pandas Series
You can create a Series from a Python list, dictionary, or array.

Example:

In [None]:
import pandas as pd

# Creating a Series from a list
marks = pd.Series([90, 85, 75, 65], index=['Arav', 'Meera', 'Rahul', 'Mohit'])

print(marks)

Arav     90
Meera    85
Rahul    75
Mohit    65
dtype: int64


## 2. Pandas DataFrames
A DataFrame is a two-dimensional, size-mutable, and labeled data structure. It is essentially a table where each column can contain different types of data (integer, float, string, etc.).

Creating a Pandas DataFrame
You can create DataFrames from dictionaries, lists, or external data sources like CSV files.

Example:

In [None]:
import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Arav', 'Meera', 'Rahul', 'Mohit'],
    'Marks': [90, 85, 75, 65]
}

df = pd.DataFrame(data)
print(df)

    Name  Marks
0   Arav     90
1  Meera     85
2  Rahul     75
3  Mohit     65


In [None]:
df.to_csv('students.csv', index=False)

## 3. Reading and Writing Data with Pandas
Pandas makes it easy to load data from external files and save it back to files after manipulation. You can read data from CSV, Excel, JSON, and SQL databases.

Reading a CSV File

In [None]:
import pandas as pd

# Reading a CSV file into a DataFrame
df = pd.read_csv('students.csv')

print(df)

    Name  Marks
0   Arav     90
1  Meera     85
2  Rahul     75
3  Mohit     65


#### Writing Data to a CSV File

In [None]:
# Writing the DataFrame to a CSV file
df.to_csv('output.csv', index=False)

## 1. Creating Pandas DataFrames
You can create DataFrames from dictionaries, lists, or CSV/Excel files.

In [None]:
import pandas as pd

# Creating a DataFrame from a dictionary
data = {'Name': ['Arav', 'Meera', 'Rahul'], 'Marks': [90, 85, 88]}
df = pd.DataFrame(data)

print(df)

    Name  Marks
0   Arav     90
1  Meera     85
2  Rahul     88


In [None]:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(7,4),
columns = ['col1', 'col2', 'col3', 'col4'])
df.head()

Unnamed: 0,col1,col2,col3,col4
0,0.69156,0.782872,0.724043,0.152824
1,0.941291,0.208102,0.405117,0.32594
2,0.061595,0.313475,0.658677,0.578303
3,0.870533,0.698336,0.824203,0.059969
4,0.606795,0.273607,0.401512,0.831971


In [None]:
df.to_csv('data.csv', index=False)

In [None]:
d = {'odd':np.arange(1,10,2),
'even':np.arange(0,10,2)}
print(d['odd'])
print(d['even'])
df = pd.DataFrame(d)
df.head()

[1 3 5 7 9]
[0 2 4 6 8]


Unnamed: 0,odd,even
0,1,0
1,3,2
2,5,4
3,7,6
4,9,8


## 2. Reading and Writing Data
Pandas provides methods for reading and writing data from various file formats such as CSV, Excel, and SQL.

In [None]:
import pandas as pd

# Reading a CSV file
df = pd.read_csv('students.csv')
print(df.head())
# Writing to a CSV file
df.to_csv('output.csv', index=False)

    Name  Marks
0   Arav     90
1  Meera     85
2  Rahul     75
3  Mohit     65


In [None]:
# Reading a CSV file
df = pd.read_csv('data.csv')

## Viewing the Top of the DataFrame: head()
The head() method displays the first 5 rows of the DataFrame by default. You can specify a different number of rows if needed.

Example:

In [None]:
# Viewing the top 5 rows of the DataFrame
print("Viewing the top 5 rows:")
print(df.head())

# Viewing the top 3 rows
print("\nViewing the top 3 rows:")
print(df.head(3))

Viewing the top 5 rows:
       col1      col2      col3      col4
0  0.691560  0.782872  0.724043  0.152824
1  0.941291  0.208102  0.405117  0.325940
2  0.061595  0.313475  0.658677  0.578303
3  0.870533  0.698336  0.824203  0.059969
4  0.606795  0.273607  0.401512  0.831971

Viewing the top 3 rows:
       col1      col2      col3      col4
0  0.691560  0.782872  0.724043  0.152824
1  0.941291  0.208102  0.405117  0.325940
2  0.061595  0.313475  0.658677  0.578303


## Viewing the Bottom of the DataFrame: tail()
The tail() method displays the last 5 rows of the DataFrame by default. Like head(), you can specify a different number of rows if desired.

Example:

In [None]:
# Viewing the last 5 rows of the DataFrame
print("Display the last 5 rows")
print(df.tail())

# Viewing the last 2 rows
print("\nDisplay the last 2 rows")
print(df.tail(2))

Display the last 5 rows
       col1      col2      col3      col4
2  0.061595  0.313475  0.658677  0.578303
3  0.870533  0.698336  0.824203  0.059969
4  0.606795  0.273607  0.401512  0.831971
5  0.393382  0.822605  0.961178  0.151446
6  0.854197  0.200633  0.578839  0.011864

Display the last 2 rows
       col1      col2      col3      col4
5  0.393382  0.822605  0.961178  0.151446
6  0.854197  0.200633  0.578839  0.011864


## Selecting Columns
You can access a single column or multiple columns in a DataFrame.

Example:

In [None]:
# Reading a CSV file
df = pd.read_csv('students.csv')

# Accessing a single column
names = df['Name']
print("Names:")
print(names)
# Accessing multiple columns
selected_columns = df[['Name', 'Marks']]
print("\nSelected Columns:")
print(selected_columns)

Names:
0     Arav
1    Meera
2    Rahul
3    Mohit
Name: Name, dtype: object

Selected Columns:
    Name  Marks
0   Arav     90
1  Meera     85
2  Rahul     75
3  Mohit     65


## Adding a New Column
You can add new columns to a DataFrame.

Example:

In [None]:
# Reading a CSV file
df = pd.read_csv('students.csv')
print(df)
# Adding a new column
df['Grade'] = ['A', 'B', 'C','D']
print(df)

    Name  Marks
0   Arav     90
1  Meera     85
2  Rahul     75
3  Mohit     65
    Name  Marks Grade
0   Arav     90     A
1  Meera     85     B
2  Rahul     75     C
3  Mohit     65     D


## Slicing and Indexing Data
Pandas provides multiple ways to access and slice data from DataFrames using loc[] and iloc[].

Using loc[] for Label-based Indexing
You can select rows and columns using labels.

Example:

In [None]:
# Selecting rows where Marks > 70
high_marks = df.loc[df['Marks'] > 70]
print("Rows where Marks > 70:")
print(high_marks)

Rows where Marks > 70:
    Name  Marks Grade
0   Arav     90     A
1  Meera     85     B
2  Rahul     75     C


## Using iloc[] for Position-based Indexing
You can access rows and columns by index position.

Example:

In [None]:
# Selecting the first two rows and the first two columns
subset = df.iloc[0:2, 0:2]
print("Subset of the DataFrame:")
print(subset)

Subset of the DataFrame:
    Name  Marks
0   Arav     90
1  Meera     85


## Adding New Rows

## 1. Using loc[] Method
The loc[] method is commonly used to add a row by specifying the index for the new row.

Example:

In [None]:
# Adding a new row using loc
df.loc[5] = ['Mohit', np.NAN, 'B']
df

Unnamed: 0,Name,Marks,Grade
0,Arav,90.0,A
1,Meera,85.0,B
2,Rahul,75.0,C
3,Mohit,65.0,D
5,Mohit,,B


### 2. Using pd.concat() for Adding Rows
You can also add rows to a DataFrame by using the pd.concat() function to concatenate a new DataFrame or Series to the existing DataFrame.

Example:

In [None]:
# Creating a new row as a DataFrame
new_row = pd.DataFrame({'Name': ['Sara'], 'Marks': [95]})

# Concatenating the new row to the existing DataFrame
df = pd.concat([df, new_row], ignore_index=True)

print(df)


    Name  Marks Grade
0   Arav   90.0     A
1  Meera   85.0     B
2  Rahul   75.0     C
3  Mohit   65.0     D
4  Mohit    NaN     B
5   Sara   95.0   NaN


### 3. Adding Multiple Rows Using pd.concat()
You can add multiple rows at once by creating a DataFrame with the new rows and concatenating it with the original DataFrame.

Example:

In [None]:
# Creating multiple new rows
new_rows = pd.DataFrame({'Name': ['Aman', 'Riya'], 'Marks': [np.NaN, 78]})

# Concatenating the new rows to the existing DataFrame
df = pd.concat([df, new_rows], ignore_index=True)

print(df)


    Name  Marks Grade
0   Arav   90.0     A
1  Meera   85.0     B
2  Rahul   75.0     C
3  Mohit   65.0     D
4  Mohit    NaN     B
5   Sara   95.0   NaN
6   Aman   88.0   NaN
7   Riya   78.0   NaN
8   Aman    NaN   NaN
9   Riya   78.0   NaN


## 6. Handling Missing Data
In real-world datasets, missing data is common. Pandas provides functions to identify, fill, or remove missing data.

### Identifying Missing Data

In [None]:
# Checking for missing values
missing_data = df.isnull().sum()
missing_data

Unnamed: 0,0
Name,0
Marks,2
Grade,5


## Filling Missing Data
You can replace missing values with a specified value, such as the mean of the column or a fixed value.

Example:

In [None]:
# Filling missing values in 'Marks' column with the mean
df['Marks'].fillna(df['Marks'].mean(), inplace=True)

In [None]:
df

Unnamed: 0,Name,Marks,Grade
0,Arav,90.0,A
1,Meera,85.0,B
2,Rahul,75.0,C
3,Mohit,65.0,D
4,Mohit,81.75,B
5,Sara,95.0,
6,Aman,88.0,
7,Riya,78.0,
8,Aman,81.75,
9,Riya,78.0,


### Dropping Missing Data

In [None]:
# Dropping rows with missing data
df_cleaned = df.dropna()
df_cleaned

Unnamed: 0,Name,Marks,Grade
0,Arav,90.0,A
1,Meera,85.0,B
2,Rahul,75.0,C
3,Mohit,65.0,D
4,Mohit,81.75,B


## Basic Statistical Operations

In [None]:
# Calculating the mean of the 'Marks' column
average_marks = df['Marks'].mean()

# Calculating the sum of all marks
total_marks = df['Marks'].sum()

# Getting a summary of statistics for all numeric columns
summary = df.describe()

print("Average Marks:", average_marks)
print("Total Marks:", total_marks)
print("Summary of Statistics:")
print(summary)

Average Marks: 81.75
Total Marks: 817.5
Summary of Statistics:
           Marks
count  10.000000
mean   81.750000
std     8.482007
min    65.000000
25%    78.000000
50%    81.750000
75%    87.250000
max    95.000000
