# Pandas hands-on

1. Create a program to read a CSV file containing student information and partial marks, and
perform various data analysis tasks using pandas functions and statistics, with the following
instructions/subtasks. Upload the Jupyter Notebook on e-learning by 9th of October EOD.

### A. **Read the CSV File**:
- Read the CSV file into a pandas DataFrame.
- Display the first few rows of the DataFrame to understand its structure.

In [3]:
import pandas as pd

studentsInfo = pd.read_csv("school.csv", sep=";") # read school.csv and remove ';' operator
# studentsInfo.head()
display(studentsInfo) # display some information

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,Unnamed: 8
0,100000,Jane Doe 100000,8240,O,2006-10-03,16.6,16.5,16.8,
1,100001,John Doe 100001,8040,O,2006-09-06,17.4,13.4,3.4,
2,100002,Jane Doe 100002,8220,O,2006-09-06,17.4,18.9,10.8,
3,100003,John Doe 100003,8020,O,2006-09-05,17.6,17.6,14.3,
4,100004,Jane Doe 100004,8220,O,2006-09-05,19.6,19.3,12.2,
...,...,...,...,...,...,...,...,...,...
266,100266,Jane Doe 100266,8204,O,2006-09-19,15.3,15.9,7.2,
267,100267,John Doe 100267,8240,O,2006-11-30,17.5,17.2,18.5,
268,100268,Jane Doe 100268,8240,O,2006-09-07,13.3,17.7,10.7,
269,100269,John Doe 100269,8240,O,2006-09-05,11.0,13.0,4.5,


### B. **Data Cleaning and Preparation**:
- Handle any missing values in the DataFrame, if present.
- Convert necessary columns to appropriate data types (e.g., numeric columns for partial
marks).

In [33]:
studentsInfo.isnull() # handling missing values

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,Unnamed: 8
0,False,False,False,False,False,False,False,False,True
1,False,False,False,False,False,False,False,False,True
2,False,False,False,False,False,False,False,False,True
3,False,False,False,False,False,False,False,False,True
4,False,False,False,False,False,False,False,False,True
...,...,...,...,...,...,...,...,...,...
266,False,False,False,False,False,False,False,False,True
267,False,False,False,False,False,False,False,False,True
268,False,False,False,False,False,False,False,False,True
269,False,False,False,False,False,False,False,False,True


In [61]:
studentsInfo.dropna(axis=1) # drop missing values, axis 1 = column

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3
0,100000,Jane Doe 100000,8240,O,2006-10-03,16.6,16.5,16.8
1,100001,John Doe 100001,8040,O,2006-09-06,17.4,13.4,3.4
2,100002,Jane Doe 100002,8220,O,2006-09-06,17.4,18.9,10.8
3,100003,John Doe 100003,8020,O,2006-09-05,17.6,17.6,14.3
4,100004,Jane Doe 100004,8220,O,2006-09-05,19.6,19.3,12.2
...,...,...,...,...,...,...,...,...
266,100266,Jane Doe 100266,8204,O,2006-09-19,15.3,15.9,7.2
267,100267,John Doe 100267,8240,O,2006-11-30,17.5,17.2,18.5
268,100268,Jane Doe 100268,8240,O,2006-09-07,13.3,17.7,10.7
269,100269,John Doe 100269,8240,O,2006-09-05,11.0,13.0,4.5


In [41]:
studentsInfo.dtypes # shows data types of the information
# convert to appropriate data types (the types are good)
studentsInfo['Numero'] = pd.to_numeric(studentsInfo['Numero'], errors='coerce', downcast='integer')
studentsInfo['Nome'] = pd.to_numeric(studentsInfo['Nome'], errors='coerce')
studentsInfo['Curso'] = pd.to_numeric(studentsInfo['Curso'], errors='coerce', downcast='integer')
studentsInfo['Regime'] = pd.to_numeric(studentsInfo['Regime'], errors='coerce')
studentsInfo['DataInscricao'] = pd.to_numeric(studentsInfo['DataInscricao'], errors='coerce')
studentsInfo['nota1'] = pd.to_numeric(studentsInfo['nota1'], errors='coerce')
studentsInfo['nota2'] = pd.to_numeric(studentsInfo['nota2'], errors='coerce')
studentsInfo['nota3'] = pd.to_numeric(studentsInfo['nota3'], errors='coerce')

# studentsInfo.dtypes # verify conversions

Numero             int32
Nome             float64
Curso              int16
Regime           float64
DataInscricao    float64
nota1            float64
nota2            float64
nota3            float64
Unnamed: 8       float64
dtype: object

### C. **Data Analysis Tasks**:
- Calculate descriptive statistics (mean, median, minimum, maximum).
- Create a function to calculate the final grade based on partial marks (e.g., weighted av-
erage).
- Count the number of students in each grade range (e.g., [0-5[, [5,10[, [10,15[ and
[15,20], based on the final grade)

In [49]:
# calculate descriptive statistics
studentsInfo.describe() # summary statistics

# studentsInfo.mean() # mean
# studentsInfo.median() # median
# studentsInfo.min() # minimum value
# studentsInfo.max() # maximum value

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,Unnamed: 8
count,271.0,0.0,271.0,0.0,0.0,271.0,271.0,271.0,0.0
mean,100135.0,,8132.059041,,,13.675646,14.067528,8.238376,
std,78.375166,,102.977866,,,2.876456,3.275631,4.327021,
min,100000.0,,8004.0,,,4.8,5.5,0.3,
25%,100067.5,,8004.0,,,11.6,11.8,5.25,
50%,100135.0,,8204.0,,,13.2,14.3,7.5,
75%,100202.5,,8220.0,,,15.8,16.75,10.5,
max,100270.0,,8240.0,,,20.0,20.0,19.5,


In [50]:
studentsInfo.mean() # mean

Numero           100135.000000
Nome                       NaN
Curso              8132.059041
Regime                     NaN
DataInscricao              NaN
nota1                13.675646
nota2                14.067528
nota3                 8.238376
Unnamed: 8                 NaN
dtype: float64

In [51]:
studentsInfo.median() # median

Numero           100135.0
Nome                  NaN
Curso              8204.0
Regime                NaN
DataInscricao         NaN
nota1                13.2
nota2                14.3
nota3                 7.5
Unnamed: 8            NaN
dtype: float64

In [52]:
studentsInfo.min() # minimum value

Numero           100000.0
Nome                  NaN
Curso              8004.0
Regime                NaN
DataInscricao         NaN
nota1                 4.8
nota2                 5.5
nota3                 0.3
Unnamed: 8            NaN
dtype: float64

In [53]:
studentsInfo.max() # maximum value

Numero           100270.0
Nome                  NaN
Curso              8240.0
Regime                NaN
DataInscricao         NaN
nota1                20.0
nota2                20.0
nota3                19.5
Unnamed: 8            NaN
dtype: float64

In [17]:
# calculate the final grade based on partial marks (e.g., weighted average)
studentsInfo['notaFinal'] = (studentsInfo['nota1'] + studentsInfo['nota2'] + studentsInfo['nota3'])/3

# studentsInfo[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3', 'notaFinal']].head()
display(studentsInfo[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3', 'notaFinal']])

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,notaFinal
0,100000,Jane Doe 100000,8240,O,2006-10-03,16.6,16.5,16.8,16.633333
1,100001,John Doe 100001,8040,O,2006-09-06,17.4,13.4,3.4,11.400000
2,100002,Jane Doe 100002,8220,O,2006-09-06,17.4,18.9,10.8,15.700000
3,100003,John Doe 100003,8020,O,2006-09-05,17.6,17.6,14.3,16.500000
4,100004,Jane Doe 100004,8220,O,2006-09-05,19.6,19.3,12.2,17.033333
...,...,...,...,...,...,...,...,...,...
266,100266,Jane Doe 100266,8204,O,2006-09-19,15.3,15.9,7.2,12.800000
267,100267,John Doe 100267,8240,O,2006-11-30,17.5,17.2,18.5,17.733333
268,100268,Jane Doe 100268,8240,O,2006-09-07,13.3,17.7,10.7,13.900000
269,100269,John Doe 100269,8240,O,2006-09-05,11.0,13.0,4.5,9.500000


In [79]:
# number of students in each grade range, based on the final grade
grade0_5 = studentsInfo[(studentsInfo['notaFinal'] >= 0) & (studentsInfo['notaFinal'] < 5)] # final grade between [0-5[
display('Final grade between [0-5[')
display(grade0_5[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3', 'notaFinal']])

grade5_10 = studentsInfo[(studentsInfo['notaFinal'] >= 5) & (studentsInfo['notaFinal'] < 10)] # final grade between [5-10[
display('Final grade between [5-10[')
display(grade5_10[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3', 'notaFinal']])

grade10_15 = studentsInfo[(studentsInfo['notaFinal'] >= 10) & (studentsInfo['notaFinal'] < 15)] # final grade between [10-15[
display('Final grade between [10-15[')
display(grade10_15[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3', 'notaFinal']])

grade15_20 = studentsInfo[(studentsInfo['notaFinal'] >= 15) & (studentsInfo['notaFinal'] <= 20)] # final grade between [15-20]
display('Final grade between [15-20]')
display(grade15_20[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3', 'notaFinal']])


'Final grade between [0-5['

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,notaFinal
50,100050,Jane Doe 100050,8040,O,2006-09-05,5.6,6.9,1.5,4.666667


'Final grade between [5-10['

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,notaFinal
8,100008,Jane Doe 100008,8004,O,2006-11-21,12.3,9.8,6.3,9.466667
11,100011,John Doe 100011,8240,O,2006-09-18,8.9,11.2,4.5,8.200000
15,100015,John Doe 100015,8028,T,2007-01-31,11.2,8.8,4.2,8.066667
18,100018,Jane Doe 100018,8004,O,2006-09-05,10.0,7.6,7.5,8.366667
21,100021,John Doe 100021,8240,O,2006-10-04,10.0,9.3,5.3,8.200000
...,...,...,...,...,...,...,...,...,...
255,100255,John Doe 100255,8204,O,2006-10-03,8.8,9.2,1.8,6.600000
256,100256,Jane Doe 100256,8240,O,2006-09-18,11.9,15.3,2.6,9.933333
261,100261,John Doe 100261,8240,O,2006-09-22,10.1,11.7,0.7,7.500000
264,100264,Jane Doe 100264,8240,O,2006-09-18,10.4,12.6,5.5,9.500000


'Final grade between [10-15['

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,notaFinal
1,100001,John Doe 100001,8040,O,2006-09-06,17.4,13.4,3.4,11.400000
5,100005,John Doe 100005,8040,O,2006-09-05,16.8,14.1,7.7,12.866667
6,100006,Jane Doe 100006,8204,O,2006-09-21,11.5,15.6,6.1,11.066667
7,100007,John Doe 100007,8040,O,2006-09-06,13.6,11.7,5.4,10.233333
10,100010,Jane Doe 100010,8240,O,2006-10-16,17.0,16.4,11.4,14.933333
...,...,...,...,...,...,...,...,...,...
263,100263,John Doe 100263,8220,T,2006-11-27,12.7,16.8,11.7,13.733333
265,100265,John Doe 100265,8220,O,2006-09-20,17.2,17.0,7.5,13.900000
266,100266,Jane Doe 100266,8204,O,2006-09-19,15.3,15.9,7.2,12.800000
268,100268,Jane Doe 100268,8240,O,2006-09-07,13.3,17.7,10.7,13.900000


'Final grade between [15-20]'

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,notaFinal
0,100000,Jane Doe 100000,8240,O,2006-10-03,16.6,16.5,16.8,16.633333
2,100002,Jane Doe 100002,8220,O,2006-09-06,17.4,18.9,10.8,15.7
3,100003,John Doe 100003,8020,O,2006-09-05,17.6,17.6,14.3,16.5
4,100004,Jane Doe 100004,8220,O,2006-09-05,19.6,19.3,12.2,17.033333
9,100009,John Doe 100009,8204,O,2006-09-19,20.0,16.9,15.0,17.3
13,100013,John Doe 100013,8004,O,2006-09-05,16.7,19.4,9.4,15.166667
23,100023,John Doe 100023,8204,O,2006-10-16,19.0,18.1,11.2,16.1
24,100024,Jane Doe 100024,8204,O,2006-09-21,19.4,13.6,13.4,15.466667
25,100025,John Doe 100025,8240,O,2006-11-28,20.0,18.3,17.9,18.733333
26,100026,Jane Doe 100026,8220,O,2006-09-19,16.8,15.4,15.0,15.733333


### D. Data Manipulation:
- Use groupby to calculate the average marks for each course

In [94]:
averageMarkByCourse = studentsInfo.groupby(['Curso']).mean(['notaFinal']) # average marks for each course
# averageMarkByCourse.head()
display(averageMarkByCourse[['notaFinal']])

Unnamed: 0_level_0,notaFinal
Curso,Unnamed: 1_level_1
8004,11.53287
8020,12.576923
8028,11.2125
8040,11.814493
8204,11.805797
8220,13.011828
8240,12.310303


### E. Data Filtering and Sorting:
- Filter the DataFrame to show only students with marks above a certain threshold.
- Sort the DataFrame by multiple columns (e.g., final grade and number/name)

In [102]:
# students with a mark above 16
markHigherX = studentsInfo[(studentsInfo['nota1'] > 16) | (studentsInfo['nota2'] > 16) | (studentsInfo['nota3'] > 16)]
display(markHigherX[['Curso', 'nota1', 'nota2', 'nota3']])

# students with a marks above 16
markHigherX2 = studentsInfo[(studentsInfo['nota1'] > 16) & (studentsInfo['nota2'] > 16) & (studentsInfo['nota3'] > 16)]
display(markHigherX2[['Curso', 'nota1', 'nota2', 'nota3']])

Unnamed: 0,Curso,nota1,nota2,nota3
0,8240,16.6,16.5,16.8
1,8040,17.4,13.4,3.4
2,8220,17.4,18.9,10.8
3,8020,17.6,17.6,14.3
4,8220,19.6,19.3,12.2
...,...,...,...,...
262,8020,17.3,18.5,10.9
263,8220,12.7,16.8,11.7
265,8220,17.2,17.0,7.5
267,8240,17.5,17.2,18.5


Unnamed: 0,Curso,nota1,nota2,nota3
0,8240,16.6,16.5,16.8
25,8240,20.0,18.3,17.9
28,8028,17.2,18.4,17.3
29,8220,17.7,16.1,17.2
38,8204,19.1,17.0,18.0
45,8004,18.2,17.3,18.1
66,8240,19.4,18.0,17.3
71,8040,19.6,19.3,19.1
77,8240,17.2,17.5,16.5
78,8004,19.6,19.4,18.9


In [113]:
# sort by number
sortNumber = studentsInfo.sort_values(['Numero'])
display('Sort by number')
display(sortNumber[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3']])

# sort by name
sortName = studentsInfo.sort_values(['Nome'])
display('Sort by name')
display(sortName[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3']])

# sort by final grade
sortGrade = studentsInfo.sort_values(['notaFinal'], ascending=False)
display('Sort by final grade')
display(sortGrade[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3', 'notaFinal']])

# sort by course and final grade
display('Sort by course and final grade')
sortCourseGrade = studentsInfo.sort_values(['Curso', 'notaFinal'], ascending=False)
display(sortCourseGrade[['Numero', 'Nome', 'Curso', 'Regime', 'DataInscricao', 'nota1', 'nota2', 'nota3', 'notaFinal']])

'Sort by number'

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3
0,100000,Jane Doe 100000,8240,O,2006-10-03,16.6,16.5,16.8
1,100001,John Doe 100001,8040,O,2006-09-06,17.4,13.4,3.4
2,100002,Jane Doe 100002,8220,O,2006-09-06,17.4,18.9,10.8
3,100003,John Doe 100003,8020,O,2006-09-05,17.6,17.6,14.3
4,100004,Jane Doe 100004,8220,O,2006-09-05,19.6,19.3,12.2
...,...,...,...,...,...,...,...,...
266,100266,Jane Doe 100266,8204,O,2006-09-19,15.3,15.9,7.2
267,100267,John Doe 100267,8240,O,2006-11-30,17.5,17.2,18.5
268,100268,Jane Doe 100268,8240,O,2006-09-07,13.3,17.7,10.7
269,100269,John Doe 100269,8240,O,2006-09-05,11.0,13.0,4.5


'Sort by name'

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3
0,100000,Jane Doe 100000,8240,O,2006-10-03,16.6,16.5,16.8
2,100002,Jane Doe 100002,8220,O,2006-09-06,17.4,18.9,10.8
4,100004,Jane Doe 100004,8220,O,2006-09-05,19.6,19.3,12.2
6,100006,Jane Doe 100006,8204,O,2006-09-21,11.5,15.6,6.1
8,100008,Jane Doe 100008,8004,O,2006-11-21,12.3,9.8,6.3
...,...,...,...,...,...,...,...,...
261,100261,John Doe 100261,8240,O,2006-09-22,10.1,11.7,0.7
263,100263,John Doe 100263,8220,T,2006-11-27,12.7,16.8,11.7
265,100265,John Doe 100265,8220,O,2006-09-20,17.2,17.0,7.5
267,100267,John Doe 100267,8240,O,2006-11-30,17.5,17.2,18.5


'Sort by final grade'

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,notaFinal
162,100162,Jane Doe 100162,8204,O,2006-09-05,19.6,19.5,19.5,19.533333
71,100071,John Doe 100071,8040,O,2006-09-05,19.6,19.3,19.1,19.333333
78,100078,Jane Doe 100078,8004,O,2006-09-05,19.6,19.4,18.9,19.300000
181,100181,John Doe 100181,8204,O,2006-09-20,19.6,20.0,17.9,19.166667
25,100025,John Doe 100025,8240,O,2006-11-28,20.0,18.3,17.9,18.733333
...,...,...,...,...,...,...,...,...,...
53,100053,John Doe 100053,8220,O,2006-09-21,10.6,7.3,1.7,6.533333
51,100051,John Doe 100051,8004,O,2006-09-06,4.8,11.9,1.2,5.966667
58,100058,Jane Doe 100058,8204,O,2006-09-21,8.4,8.0,0.3,5.566667
207,100207,John Doe 100207,8004,O,2006-09-05,10.1,5.5,0.7,5.433333


'Sort by course and final grade'

Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,notaFinal
25,100025,John Doe 100025,8240,O,2006-11-28,20.0,18.3,17.9,18.733333
66,100066,Jane Doe 100066,8240,O,2006-10-24,19.4,18.0,17.3,18.233333
170,100170,Jane Doe 100170,8240,O,2006-09-21,17.6,19.3,16.9,17.933333
267,100267,John Doe 100267,8240,O,2006-11-30,17.5,17.2,18.5,17.733333
142,100142,Jane Doe 100142,8240,O,2006-09-06,19.0,18.2,15.0,17.400000
...,...,...,...,...,...,...,...,...,...
253,100253,John Doe 100253,8004,O,2006-09-06,11.2,9.1,2.3,7.533333
173,100173,John Doe 100173,8004,O,2006-09-05,10.6,6.7,3.8,7.033333
46,100046,Jane Doe 100046,8004,O,2006-09-06,8.4,9.4,3.0,6.933333
51,100051,John Doe 100051,8004,O,2006-09-06,4.8,11.9,1.2,5.966667


### F. Indexing and Selection:
- Demonstrate different ways to select data: using column labels, boolean indexing, and
loc/iloc.
- Set a column as the index and show how it affects data selection.

In [26]:
# ways to select data
display(studentsInfo[['Nome']]) # select one column, 'Nome', from column labels
display(studentsInfo[['Nome', 'Curso']]) # select multiple columns, 'Nome', from column labels

studentsInfo['DataInscricao'] = pd.to_datetime(studentsInfo['DataInscricao'], errors='coerce') # convert date to datetime type
selectDate = studentsInfo[studentsInfo['DataInscricao'] > pd.to_datetime('2006-11-01')] # select data from boolean indexing
display(selectDate[['Nome', 'Curso', 'DataInscricao']] )

display(studentsInfo.loc[(studentsInfo['Curso'] == 8240) & (studentsInfo['notaFinal'] > 17.5)]) # select data with method loc(), with some conditions
display(studentsInfo.iloc[1:25, :5]) # select data with method iloc(), index based method (last doesn't count)

Unnamed: 0,Nome
0,Jane Doe 100000
1,John Doe 100001
2,Jane Doe 100002
3,John Doe 100003
4,Jane Doe 100004
...,...
266,Jane Doe 100266
267,John Doe 100267
268,Jane Doe 100268
269,John Doe 100269


Unnamed: 0,Nome,Curso
0,Jane Doe 100000,8240
1,John Doe 100001,8040
2,Jane Doe 100002,8220
3,John Doe 100003,8020
4,Jane Doe 100004,8220
...,...,...
266,Jane Doe 100266,8204
267,John Doe 100267,8240
268,Jane Doe 100268,8240
269,John Doe 100269,8240


Unnamed: 0,Nome,Curso,DataInscricao
8,Jane Doe 100008,8004,2006-11-21
12,Jane Doe 100012,8220,2006-11-30
14,Jane Doe 100014,8204,2006-11-21
15,John Doe 100015,8028,2007-01-31
25,John Doe 100025,8240,2006-11-28
101,John Doe 100101,8020,2006-11-27
112,Jane Doe 100112,8004,2006-11-21
114,Jane Doe 100114,8004,2007-03-15
115,John Doe 100115,8004,2006-11-20
121,John Doe 100121,8004,2006-11-21


Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao,nota1,nota2,nota3,Unnamed: 8,notaFinal
25,100025,John Doe 100025,8240,O,2006-11-28,20.0,18.3,17.9,,18.733333
66,100066,Jane Doe 100066,8240,O,2006-10-24,19.4,18.0,17.3,,18.233333
170,100170,Jane Doe 100170,8240,O,2006-09-21,17.6,19.3,16.9,,17.933333
267,100267,John Doe 100267,8240,O,2006-11-30,17.5,17.2,18.5,,17.733333


Unnamed: 0,Numero,Nome,Curso,Regime,DataInscricao
1,100001,John Doe 100001,8040,O,2006-09-06
2,100002,Jane Doe 100002,8220,O,2006-09-06
3,100003,John Doe 100003,8020,O,2006-09-05
4,100004,Jane Doe 100004,8220,O,2006-09-05
5,100005,John Doe 100005,8040,O,2006-09-05
6,100006,Jane Doe 100006,8204,O,2006-09-21
7,100007,John Doe 100007,8040,O,2006-09-06
8,100008,Jane Doe 100008,8004,O,2006-11-21
9,100009,John Doe 100009,8204,O,2006-09-19
10,100010,Jane Doe 100010,8240,O,2006-10-16


In [29]:
studentsInfoIndexCourse = studentsInfo.set_index(['Curso']) # index course column
display(studentsInfoIndexCourse[['Nome', 'Regime', 'DataInscricao', 'notaFinal']])

Unnamed: 0_level_0,Nome,Regime,DataInscricao,notaFinal
Curso,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
8240,Jane Doe 100000,O,2006-10-03,16.633333
8040,John Doe 100001,O,2006-09-06,11.400000
8220,Jane Doe 100002,O,2006-09-06,15.700000
8020,John Doe 100003,O,2006-09-05,16.500000
8220,Jane Doe 100004,O,2006-09-05,17.033333
...,...,...,...,...
8204,Jane Doe 100266,O,2006-09-19,12.800000
8240,John Doe 100267,O,2006-11-30,17.733333
8240,Jane Doe 100268,O,2006-09-07,13.900000
8240,John Doe 100269,O,2006-09-05,9.500000


### G. Data Export:
- After performing various operations, export the resulting DataFrame to different formats
(CSV, Excel).

In [31]:
studentsInfo.to_csv('exportCsvAula03.csv', index=False) # export to CSV
studentsInfo.to_excel('exportExcelAula03.xlsx', index=False) # export to Excel

<br>

**Maria Rafaela Abrunhosa 107658**