
# NumPy & Pandas Lab (Questions) â€” v2
**Duration:** ~2 hours




## Setup
Import NumPy and Pandas as needed in the cell below.

In [1]:
import numpy as np
import pandas as pd

# NumPy: Intro & Array Creation

**Exercise 1:** In comments, note two benefits of NumPy over plain Python lists (e.g., speed via vectorization, convenient broadcasting). Then create an array `a = [1,2,3]` and print `2*a`.

In [2]:
# 1. Convenient Broadcasting
# 2. speed up vactorization

a = np.arange(1,4)
print(2*a)

[2 4 6]


**Exercise 2:** Create a 1D array from the list [1,2,3] and print its Python type and NumPy dtype.

In [3]:
lst = [1,2,3]
arr = np.array(lst)
print(type(arr))
print(arr.dtype)


<class 'numpy.ndarray'>
int64


**Exercise 3:** Create a 3x3 array of zeros with integer dtype and print it.

In [5]:
zeross = np.zeros((3,3), dtype = int)
zeross


array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

**Exercise 4:** Create a 2x3 array of ones with float dtype and print its shape and dtype.

In [8]:
oness = np.ones((2,3), dtype = float)
oness

print(oness.shape)
print(oness.dtype)


(2, 3)
float64


**Exercise 5:** Using `arange`, create `[5,10,15,20]`.

In [11]:
arr = np.arange(5,21,5)
arr

array([ 5, 10, 15, 20])

**Exercise 6:** Using `linspace`, create 5 evenly spaced numbers from 0 to 1 (inclusive).

In [13]:
b = np.linspace(0,1,5)
b

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

# NumPy: Properties & Element-wise Operations

**Exercise 1:** Given `arr = np.array([[1,2,3],[4,5,6]])`, print its `shape`, `ndim`, and `dtype`.

In [15]:
arr = np.array([[1,2,3], [4,5,6]])
print(arr.shape)
print(arr.ndim)
print(arr.dtype)

(2, 3)
2
int64


**Exercise 2:** Cast the above `arr` to float dtype and print its dtype.

In [18]:
arr_float = arr.astype(float)
print(arr_float.dtype)

float64


**Exercise 3:** Perform element-wise addition, subtraction, multiplication, and division between `a=np.array([10,20,30])` and `b=np.array([1,2,3])` and print all results.

In [19]:
a = np.array([10,20,30])
b = np.array([1,2,3])
print(a+b)
print(a-b)
print(a*b)
print(a/b)

[11 22 33]
[ 9 18 27]
[10 40 90]
[10. 10. 10.]


**Exercise 4:** For `arr = np.array([1,2,3,4])`, compute `arr + 10` and `arr * 2` and print both.

In [20]:
arr = np.array([1,2,3,4])
print(arr + 10)
print(arr * 2)

[11 12 13 14]
[2 4 6 8]


# NumPy: Aggregations

**Exercise 1:** Create `arr = np.arange(1,10).reshape(3,3)` and print the overall `sum()`, `mean()`, and `std()`.

In [23]:
arr = np.arange(1,10).reshape(3,3)
print(arr.sum())
print(arr.mean())
print(arr.std())


45
5.0
2.581988897471611


**Exercise 2:** Using the same `arr`, print column-wise sums (axis=0) and row-wise means (axis=1).

In [24]:
print(arr)
print(arr.sum(axis = 0))
print(arr.sum(axis = 1))

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[12 15 18]
[ 6 15 24]


# Pandas: Series Creation, Indexing, Slicing, Vector Ops

**Exercise 1:** Create a Series from the list `[10,20,30]` with index `['a','b','c']` and print it.

In [26]:
ser = pd.Series([10,20,30], index = ['a','b','c'])
ser

a    10
b    20
c    30
dtype: int64

**Exercise 2:** Create a Series from the dict `{'apples':3,'bananas':2,'oranges':5}` and print the value for 'bananas'.

In [28]:
ser2 = pd.Series({'apples': 3, 'banana': 2, 'oranges': 5})
print(ser2['banana'])

2


**Exercise 3:** From `s = pd.Series([10,20,30,40,50], index=list('abcde'))`, slice from label 'b' to 'd' and print.

In [34]:
s = pd.Series([10,20,30,40,50], index = list('abcde'))
print(s['b':'d'])
print(s[1:4])

b    20
c    30
d    40
dtype: int64
b    20
c    30
d    40
dtype: int64


**Exercise 4:** Create `s = pd.Series([1,2,3,4])` and print `s*2 + 1`.

In [36]:
s = pd.Series([1,2,3,4])
print(s*2+1)

0    3
1    5
2    7
3    9
dtype: int64


# Pandas: DataFrame Creation & Overview

**Exercise 1:** Create a DataFrame from a dict with columns `name`, `age`, `score` (3 rows) and print `head()`.

In [39]:
df = pd.DataFrame({
                        'name': ['Kunwar', 'Nakul', 'Aditi'],
                        'age': [19, 21,20],
                        'score': [453, 303, 384]
                })

df.head()

Unnamed: 0,name,age,score
0,Kunwar,19,453
1,Nakul,21,303
2,Aditi,20,384


**Exercise 2:** Create a DataFrame from a list of dicts with the same columns as above and print `tail()`.

In [40]:
df2 = pd.DataFrame([{'name': 'Kunwar', 'age': 19, 'score': 423},{'name': 'Nakul', 'age': 21, 'score': 384}, {'name': 'Kunwar', 'age': 19, 'score': 423}])
df2.tail()

Unnamed: 0,name,age,score
0,Kunwar,19,423
1,Nakul,21,384
2,Kunwar,19,423


**Exercise 3:** Create a DataFrame `df_csv` *in memory* with columns name, age, score (3 rows) and call `info()`.

In [42]:
df_csv = pd.DataFrame([['name', 'age', 'score'],
          ['Kunwar', 19, 384],
          ['Nakul', 21, 473],
          ['Aditi', 20, 383]])
df_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       4 non-null      object
 1   1       4 non-null      object
 2   2       4 non-null      object
dtypes: object(3)
memory usage: 228.0+ bytes


**Exercise 4:** On the same `df_csv`, call `describe()` and print the result.

In [43]:
df_csv.describe()

Unnamed: 0,0,1,2
count,4,4,4
unique,4,4,4
top,name,age,score
freq,1,1,1


# Pandas: Selecting Data with loc[] and iloc[]

**Exercise 1:** Using `df = pd.DataFrame({'name':['Ana','Ben','Cara'],'age':[20,23,21],'score':[81,67,90]})`, select rows where age>20 and only columns `name` and `score` using `loc`.

In [57]:
df = pd.DataFrame({'name':['Ana','Ben','Cara'],'age':[20,23,21],'score':[81,67,90]})
df.loc[df['age']>20]

Unnamed: 0,name,age,score
1,Ben,23,67
2,Cara,21,90


**Exercise 2:** From the same `df`, select the first two rows and first two columns using `iloc`.

In [48]:
df.iloc[:2,:2]

Unnamed: 0,name,age
0,Ana,20
1,Ben,23


**Exercise 3:** Set `df.index` to `['s1','s2','s3']` and select the row with label `'s2'` using `loc`.

In [52]:
df.index = ['s1','s2','s3']
df.loc['s2']


name     Ben
age       23
score     67
Name: s2, dtype: object

# Pandas: Adding & Removing Columns

**Exercise 1:** Add a column `passed` that is True if score>=80 else False, for `df = pd.DataFrame({'name':['Ana','Ben','Cara'],'score':[81,67,90]})`.

In [59]:
df = pd.DataFrame({'name':['Ana','Ben','Cara'],'score':[81,67,90]})

df['passed'] = df['score'] >= 80
df

Unnamed: 0,name,score,passed
0,Ana,81,True
1,Ben,67,False
2,Cara,90,True


**Exercise 2:** Remove column `age` from `df` (if present) using `drop(..., axis=1)` and print the remaining columns.

In [58]:
df.drop('age', axis = 1)

Unnamed: 0,name,score
0,Ana,81
1,Ben,67
2,Cara,90


In [60]:
df.columns

Index(['name', 'score', 'passed'], dtype='object')

# Pandas: Filtering Rows with Conditions

**Exercise 1:** Given `df = pd.DataFrame({'name':['Ana','Ben','Cara','Dan'],'score':[81,67,90,74]})`, select rows with `score` between 70 and 90 (inclusive).

In [63]:
df = pd.DataFrame({'name':['Ana','Ben','Cara','Dan'],'score':[81,67,90,74]})

df.loc[(df['score'] > 70) & (df['score'] < 90)]

Unnamed: 0,name,score
0,Ana,81
3,Dan,74


# Pandas: Renaming Columns & Updating Values

**Exercise 1:** Rename the column `score` to `marks` for a DataFrame and print new columns.

In [68]:
df.rename(columns = {'score':'marks'}, inplace = True)
df['marks']

0    81
1    67
2    90
3    74
Name: marks, dtype: int64

**Exercise 2:** Increase `marks` by 5 for the row where `name=='Ana'` and print the updated row.

In [70]:
df.loc[df['name'] == 'Ana', 'marks'] += 5
df

Unnamed: 0,name,marks
0,Ana,86
1,Ben,67
2,Cara,90
3,Dan,74


# Pandas: Sorting by Column(s)

**Exercise 1:** Sort `df = pd.DataFrame({'name':['Ana','Ben','Cara'],'marks':[86,67,90]})` by `marks` descending, then by `name` ascending, and print the result.

In [71]:
df = pd.DataFrame({'name':['Ana','Ben','Cara'],'marks':[86,67,90]})

df.sort_values(by = ['marks', 'name'], ascending = [False, True])


Unnamed: 0,name,marks
2,Cara,90
0,Ana,86
1,Ben,67


# Pandas: Handling Missing Data (isnull, dropna, fillna)

**Exercise 1:** Create `df = pd.DataFrame({'a':[1,None,3],'b':[None,2,2],'c':[5,6,None]})` and print the count of missing values in each column.

In [76]:
df = pd.DataFrame({'a':[1,None,3],'b':[None,2,2],'c':[5,6,None]})

print(df.isnull().sum())
print(df.isna().sum())


a    1
b    1
c    1
dtype: int64
a    1
b    1
c    1
dtype: int64


**Exercise 2:** From the same `df`, drop rows where column `a` is missing using `dropna(subset=['a'])` and print the result.

In [77]:
df

Unnamed: 0,a,b,c
0,1.0,,5.0
1,,2.0,6.0
2,3.0,2.0,


In [75]:
df.dropna(subset = ['a'])

Unnamed: 0,a,b,c
0,1.0,,5.0
2,3.0,2.0,


**Exercise 3:** Fill missing values in column `b` with the column mean and print the updated DataFrame.

In [78]:
df['b'] = df['b'].fillna(df['b'].mean())
df

Unnamed: 0,a,b,c
0,1.0,2.0,5.0
1,,2.0,6.0
2,3.0,2.0,
