Create a series of 10 elements, random integers from 70 to 100, representing scores
on a monthly exam. Set the index to be the month names, starting in September and ending in June. (If these months don’t match the school year in your location, feel
free to make them more realistic.)

With this series, write code to answer the following questions:
* What is the student’s average test score for the entire year?
* What is the student’s average test score during the first half of the year (i.e., the first five months)?
* What is the student’s average test score during the second half of the year?
* Did the student improve their performance in the second half? If so, by how much?

In [7]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np

In [9]:
s = Series([1, 2, 3, 4, 5])
s

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [10]:
s = Series([1, 2, 3, 4, 5], index=["a", "b", "c", "d", "e"])
s

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [None]:
s = Series([1, 2, 3, 4, 5], index=["a", "b", "c", "d", "e"], name="my_series")
s

a    1
b    2
c    3
d    4
e    5
Name: my_series, dtype: int64

In [13]:
s = Series(np.array([1, 2, 3, 4, 5]))
s

0    1
1    2
2    3
3    4
4    5
dtype: int64

Creating a random-number generator object:

In [24]:
g = np.random.default_rng(0)

In [25]:
s = Series(g.integers(70, 101, 10))
s

0    96
1    89
2    85
3    78
4    79
5    71
6    72
7    70
8    75
9    95
dtype: int64

String indexes

In [27]:
g = np.random.default_rng(0)
s = Series(
    g.integers(70, 101, 10),
    index=["Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
)
s

Mar    96
Apr    89
May    85
Jun    78
Jul    79
Aug    71
Sep    72
Oct    70
Nov    75
Dec    95
dtype: int64

In [28]:
g = np.random.default_rng(0)
s = Series(g.integers(70, 101, 10))
s.index = "Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split()
s

Mar    96
Apr    89
May    85
Jun    78
Jul    79
Aug    71
Sep    72
Oct    70
Nov    75
Dec    95
dtype: int64

In [30]:
g = np.random.default_rng(0)
s = Series(g.integers(70, 101, 10))
s.index = Series("Mar Apr May Jun Jul Aug Sep Oct Nov Dec".split())
s

Mar    96
Apr    89
May    85
Jun    78
Jul    79
Aug    71
Sep    72
Oct    70
Nov    75
Dec    95
dtype: int64

In [31]:
g = np.random.default_rng(0)
months = ["Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
s = Series(g.integers(70, 101, 10), index=months)
s

Mar    96
Apr    89
May    85
Jun    78
Jul    79
Aug    71
Sep    72
Oct    70
Nov    75
Dec    95
dtype: int64

**What is the student’s average test score for the entire year?**

In [32]:
print(f"Yearly average: {s.mean()}")

Yearly average: 81.0


**What is the student’s average test score during the first half of the year (i.e., the first five months)?**

In [33]:
print(f"Average for the first half of the year: {s[:5].mean()}")

Average for the first half of the year: 85.4


**What is the student’s average test score during the second half of the year?**

In [34]:
print(f"Average for the second half of the year: {s[5:].mean()}")

Average for the second half of the year: 76.6


In [36]:
print(f"Average for the first half of the year: {s.iloc[:5].mean()}")
print(f"Average for the second half of the year: {s.iloc[5:].mean()}")

Average for the first half of the year: 85.4
Average for the second half of the year: 76.6


In [None]:
print(f"Average for the first half of the year: {s[months[:5]].mean()}")
print(f"Average for the first half of the year: {s.loc['Mar':'Jul'].mean()}")
print(f"Average for the second half of the year: {s[months[5:]].mean()}")
print(f"Average for the second half of the year: {s.loc['Aug':].mean()}")

Average for the first half of the year: 85.4
Average for the first half of the year: 85.4
Average for the second half of the year: 76.6
Average for the second half of the year: 76.6


`.loc` may be preferable but it can take twice as long as `.iloc` in benchmarks.

In [39]:
print(f"Average for the first half of the year: {s.head().mean()}")
print(f"Average for the first half of the year: {s.head(5).mean()}")
print(f"Average for the second half of the year: {s.tail().mean()}")
print(f"Average for the second half of the year: {s.tail(5).mean()}")

Average for the first half of the year: 85.4
Average for the first half of the year: 85.4
Average for the second half of the year: 76.6
Average for the second half of the year: 76.6


**Did the student improve their performance in the second half? If so, by how much?**

In [40]:
first_half_average = s["Mar":"Jul"].mean()
second_half_average = s["Aug":"Dec"].mean()
print(f"First half average: {first_half_average}")
print(f"Second half average: {second_half_average}")
print(f"Improvement: {second_half_average - first_half_average}")

First half average: 85.4
Second half average: 76.6
Improvement: -8.800000000000011


All together

In [41]:
g = np.random.default_rng(0)
months = "Sep Oct Nov Dec Jan Feb Mar Apr May Jun".split()
s = Series(g.integers(70, 101, 10), index=months)

print(f"Yearly average: {s.mean()}")

first_half_average = s["Sep":"Jan"].mean()
second_half_average = s["Feb":"Jun"].mean()

print(f"First half average: {first_half_average}")
print(f"Second half average: {second_half_average}")

print(f"Improvement: {second_half_average - first_half_average}")

Yearly average: 81.0
First half average: 85.4
Second half average: 76.6
Improvement: -8.800000000000011


More questions

**In which month did this student get their highest score?**

In [65]:
# Sorting the series
highest_score = s.sort_values(ascending=False).iloc[:1].values[0]
months_highest_score = s[s == highest_score]
print(
    f"Months with the highest score ({highest_score}): {' '.join(months_highest_score.index)}"
)

# Using a mask
mask = s[s == s.max()]
print(f"Months with the highest score ({s.max()}): {' '.join(mask.index)}")

# Using idxmax
print(f"Months with the highest score ({s.max()}): {s.idxmax()}")

Months with the highest score (96): Sep
Months with the highest score (96): Sep
Months with the highest score (96): Sep


**What were the five highest scores?**

In [79]:
# Using nlargest
five_highest = s.nlargest(5)  # Series
five_highest_str = [str(x) for x in five_highest]  # List of strings
print(f"Five highest scores: {' '.join(five_highest_str)}")

# Sorting the series
five_highest = s.sort_values(ascending=False).head()  # Series
five_highest_str = [str(x) for x in five_highest]  # List of strings
print(f"Five highest scores: {' '.join(five_highest_str)}")

Five highest scores: 96 95 89 85 79
Five highest scores: 96 95 89 85 79


**Round the student's scores to the nearest 10**

In [94]:
rounded_scores = s.round(-1)
print(f"Rounded scores: \n{rounded_scores}")
print(f"Rounded scores: \n{rounded_scores.value_counts()}")
rounded_scores_str = [str(x) for x in rounded_scores]
print(f"Rounded scores: {' '.join(rounded_scores_str)}")

Rounded scores: 
Sep    100
Oct     90
Nov     80
Dec     80
Jan     80
Feb     70
Mar     70
Apr     70
May     80
Jun    100
dtype: int64
Rounded scores: 
80     4
70     3
100    2
90     1
Name: count, dtype: int64
Rounded scores: 100 90 80 80 80 70 70 70 80 100


In [108]:
a = Series([15, 75, 875])
print(a.round(-1))
print(a.round(-2))
print(a.round(-3))

0     20
1     80
2    880
dtype: int64
0      0
1    100
2    900
dtype: int64
0       0
1       0
2    1000
dtype: int64


In [None]:
a = Series([15.46])
print(a.round(1))  # Number of decimal places to round to

0    15.5
dtype: float64


In [103]:
a = Series([1546])
print(a.round(-3))  # Number of places to round to

0    2000
dtype: int64
