# Using Jupyter Notebooks
:label:`sec_jupyter`


This section describes how to edit and run the code
in each section of this book
using the Jupyter Notebook. Make sure you have
installed Jupyter and downloaded the
code as described in
:ref:`chap_installation`.
If you want to know more about Jupyter see the excellent tutorial in
their [documentation](https://jupyter.readthedocs.io/en/latest/).


## Editing and Running the Code Locally

Suppose that the local path of the book's code is `xx/yy/d2l-en/`. Use the shell to change the directory to this path (`cd xx/yy/d2l-en`) and run the command `jupyter notebook`. If your browser does not do this automatically, open http://localhost:8888 and you will see the interface of Jupyter and all the folders containing the code of the book, as shown in :numref:`fig_jupyter00`.

![The folders containing the code of this book.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter00.png?raw=1)
:width:`600px`
:label:`fig_jupyter00`


You can access the notebook files by clicking on the folder displayed on the webpage.
They usually have the suffix ".ipynb".
For the sake of brevity, we create a temporary "test.ipynb" file.
The content displayed after you click it is
shown in :numref:`fig_jupyter01`.
This notebook includes a markdown cell and a code cell. The content in the markdown cell includes "This Is a Title" and "This is text.".
The code cell contains two lines of Python code.

![Markdown and code cells in the "text.ipynb" file.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter01.png?raw=1)
:width:`600px`
:label:`fig_jupyter01`


Double click on the markdown cell to enter edit mode.
Add a new text string "Hello world." at the end of the cell, as shown in :numref:`fig_jupyter02`.

![Edit the markdown cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter02.png?raw=1)
:width:`600px`
:label:`fig_jupyter02`


As demonstrated in :numref:`fig_jupyter03`,
click "Cell" $\rightarrow$ "Run Cells" in the menu bar to run the edited cell.

![Run the cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter03.png?raw=1)
:width:`600px`
:label:`fig_jupyter03`

After running, the markdown cell is shown in :numref:`fig_jupyter04`.

![The markdown cell after running.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter04.png?raw=1)
:width:`600px`
:label:`fig_jupyter04`


Next, click on the code cell. Multiply the elements by 2 after the last line of code, as shown in :numref:`fig_jupyter05`.

![Edit the code cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter05.png?raw=1)
:width:`600px`
:label:`fig_jupyter05`


You can also run the cell with a shortcut ("Ctrl + Enter" by default) and obtain the output result from :numref:`fig_jupyter06`.

![Run the code cell to obtain the output.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter06.png?raw=1)
:width:`600px`
:label:`fig_jupyter06`


When a notebook contains more cells, we can click "Kernel" $\rightarrow$ "Restart & Run All" in the menu bar to run all the cells in the entire notebook. By clicking "Help" $\rightarrow$ "Edit Keyboard Shortcuts" in the menu bar, you can edit the shortcuts according to your preferences.

## Advanced Options

Beyond local editing two things are quite important: editing the notebooks in the markdown format and running Jupyter remotely.
The latter matters when we want to run the code on a faster server.
The former matters since Jupyter's native ipynb format stores a lot of auxiliary data that is
irrelevant to the content,
mostly related to how and where the code is run.
This is confusing for Git, making
reviewing contributions very difficult.
Fortunately there is an alternative---native editing in the markdown format.

### Markdown Files in Jupyter

If you wish to contribute to the content of this book, you need to modify the
source file (md file, not ipynb file) on GitHub.
Using the notedown plugin we
can modify notebooks in the md format directly in Jupyter.


First, install the notedown plugin, run the Jupyter Notebook, and load the plugin:

```
pip install d2l-notedown  # You may need to uninstall the original notedown.
jupyter notebook --NotebookApp.contents_manager_class='notedown.NotedownContentsManager'
```

You may also turn on the notedown plugin by default whenever you run the Jupyter Notebook.
First, generate a Jupyter Notebook configuration file (if it has already been generated, you can skip this step).

```
jupyter notebook --generate-config
```

Then, add the following line to the end of the Jupyter Notebook configuration file (for Linux or macOS, usually in the path `~/.jupyter/jupyter_notebook_config.py`):

```
c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'
```

After that, you only need to run the `jupyter notebook` command to turn on the notedown plugin by default.

### Running Jupyter Notebooks on a Remote Server

Sometimes, you may want to run Jupyter notebooks on a remote server and access it through a browser on your local computer. If Linux or macOS is installed on your local machine (Windows can also support this function through third-party software such as PuTTY), you can use port forwarding:

```
ssh myserver -L 8888:localhost:8888
```

The above string `myserver` is the address of the remote server.
Then we can use http://localhost:8888 to access the remote server `myserver` that runs Jupyter notebooks. We will detail on how to run Jupyter notebooks on AWS instances
later in this appendix.

### Timing

We can use the `ExecuteTime` plugin to time the execution of each code cell in Jupyter notebooks.
Use the following commands to install the plugin:

```
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextension enable execute_time/ExecuteTime
```

## Summary

* Using the Jupyter Notebook tool, we can edit, run, and contribute to each section of the book.
* We can run Jupyter notebooks on remote servers using port forwarding.


## Exercises

1. Edit and run the code in this book with the Jupyter Notebook on your local machine.
1. Edit and run the code in this book with the Jupyter Notebook *remotely* via port forwarding.
1. Compare the running time of the operations $\mathbf{A}^\top \mathbf{B}$ and $\mathbf{A} \mathbf{B}$ for two square matrices in $\mathbb{R}^{1024 \times 1024}$. Which one is faster?


[Discussions](https://discuss.d2l.ai/t/421)


In [4]:
# importing required modules
# !pip install pandas   # pip => preferred intaller program
# !pip install numpy
import pandas as pd
import numpy as np


In [69]:
# importing required modules
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

user_data = [['alice',19,'F','student'],['john',26,'M','student']]
user_columns = ['name','age','gender','job']
user1 = pd.DataFrame(data=user_data, columns=user_columns)
print(user1)

user_data = dict(name=['eric','julie'], age=[22,58], gender=['M','F'], job=['student','manager'])
print(user_data, type(user_data))
user2 = pd.DataFrame(data=user_data)
print(user2)

user_data = {'name':['peter','paul'], 'age':[33,44], 'gender':['M','F'], 'job':['engineer','scientist']}
print(user_data, type(user_data))
user3 = pd.DataFrame(data=user_data)
print(user3)

users = pd.concat([user1,user2,user3], ignore_index=True)
print(users)

dict_data = dict(name=['alice','john','eric','julie','andrew'],height=[165,180,175,180,185])
user4 = pd.DataFrame(data=dict_data)
print(user4)

merge_inner = pd.merge(users, user4, on='name', how='inner')
print(merge_inner)

merge_outer = pd.merge(users, user4, on='name', how='outer')
print(merge_outer)

merge_left = pd.merge(users, user4, on='name', how='left')
print(merge_left)

merge_right = pd.merge(users, user4, on='name', how='right')
print(merge_right)

    name  age gender      job
0  alice   19      F  student
1   john   26      M  student
{'name': ['eric', 'julie'], 'age': [22, 58], 'gender': ['M', 'F'], 'job': ['student', 'manager']} <class 'dict'>
    name  age gender      job
0   eric   22      M  student
1  julie   58      F  manager
{'name': ['peter', 'paul'], 'age': [33, 44], 'gender': ['M', 'F'], 'job': ['engineer', 'scientist']} <class 'dict'>
    name  age gender        job
0  peter   33      M   engineer
1   paul   44      F  scientist
    name  age gender        job
0  alice   19      F    student
1   john   26      M    student
2   eric   22      M    student
3  julie   58      F    manager
4  peter   33      M   engineer
5   paul   44      F  scientist
     name  height
0   alice     165
1    john     180
2    eric     175
3   julie     180
4  andrew     185
    name  age gender      job  height
0  alice   19      F  student     165
1   john   26      M  student     180
2   eric   22      M  student     175
3  julie   

In [70]:
# importing required modules
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

user_data = [['alice',19,'F','student'],['john',26,'M','student']]
user_columns = ['name','age','gender','job']
user1 = pd.DataFrame(data=user_data, columns=user_columns)
print(user1)

user_data = dict(name=['eric','julie'], age=[22,58], gender=['M','F'], job=['student','manager'])
print(user_data, type(user_data))
user2 = pd.DataFrame(data=user_data)
print(user2)

user_data = {'name':['peter','paul'], 'age':[33,44], 'gender':['M','F'], 'job':['engineer','scientist']}
print(user_data, type(user_data))
user3 = pd.DataFrame(data=user_data)
print(user3)

users = pd.concat([user1,user2,user3], ignore_index=True)
print(users)

dict_data = dict(name=['alice','john','eric','julie','andrew'],height=[165,180,175,180,185])
user4 = pd.DataFrame(data=dict_data)
print(user4)

merge_inner = pd.merge(users, user4, on='name', how='inner')
print(merge_inner)

merge_outer = pd.merge(users, user4, on='name', how='outer')
print(merge_outer)

merge_left = pd.merge(users, user4, on='name', how='left')
print(merge_left)

merge_right = pd.merge(users, user4, on='name', how='right')
print(merge_right)

    name  age gender      job
0  alice   19      F  student
1   john   26      M  student
{'name': ['eric', 'julie'], 'age': [22, 58], 'gender': ['M', 'F'], 'job': ['student', 'manager']} <class 'dict'>
    name  age gender      job
0   eric   22      M  student
1  julie   58      F  manager
{'name': ['peter', 'paul'], 'age': [33, 44], 'gender': ['M', 'F'], 'job': ['engineer', 'scientist']} <class 'dict'>
    name  age gender        job
0  peter   33      M   engineer
1   paul   44      F  scientist
    name  age gender        job
0  alice   19      F    student
1   john   26      M    student
2   eric   22      M    student
3  julie   58      F    manager
4  peter   33      M   engineer
5   paul   44      F  scientist
     name  height
0   alice     165
1    john     180
2    eric     175
3   julie     180
4  andrew     185
    name  age gender      job  height
0  alice   19      F  student     165
1   john   26      M  student     180
2   eric   22      M  student     175
3  julie   

Quality Control:Duplicate Data

In [10]:
df = users.copy()
df = pd.concat([df, df[df.index == 0]])
df = pd.concat([df, df[df.index == 1]])
df = pd.concat([df, df[df.index == 2]])
df = pd.concat([df, df[df.index == 3]], ignore_index=True)

In [15]:
df.sort_values(by='name', inplace=True, ignore_index=True)
df

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,alice,19,F,student
2,eric,22,M,student
3,eric,22,M,student
4,john,26,M,student
5,john,26,M,student
6,julie,58,F,manager
7,julie,58,F,manager
8,paul,44,F,scientist
9,peter,33,M,engineer


In [17]:
print(df.duplicated())
df[df.duplicated()]

df[~df.duplicated()]

0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
9    False
dtype: bool


Unnamed: 0,name,age,gender,job
0,alice,19,F,student
2,eric,22,M,student
4,john,26,M,student
6,julie,58,F,manager
8,paul,44,F,scientist
9,peter,33,M,engineer


In [25]:
print(df[['job','gender']])
print(df[['job','gender']].duplicated())
print(df[['job','gender']].duplicated().sum())

         job gender
0    student      F
1    student      F
2    student      M
3    student      M
4    student      M
5    student      M
6    manager      F
7    manager      F
8  scientist      F
9   engineer      M
0    False
1     True
2    False
3     True
4     True
5     True
6    False
7     True
8    False
9    False
dtype: bool
5


In [21]:
df.job
df['job']

Unnamed: 0,job
0,student
1,student
2,student
3,student
4,student
5,student
6,manager
7,manager
8,scientist
9,engineer


In [23]:
# for more than one attributes
df[['job','gender']]

Unnamed: 0,job,gender
0,student,F
1,student,F
2,student,M
3,student,M
4,student,M
5,student,M
6,manager,F
7,manager,F
8,scientist,F
9,engineer,M


In [26]:
df.drop_duplicates(inplace=True, ignore_index=True)
df

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,eric,22,M,student
2,john,26,M,student
3,julie,58,F,manager
4,paul,44,F,scientist
5,peter,33,M,engineer


Quality Control:Missing Data

In [28]:
df = merge_left.copy()
df

Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,
5,paul,44,F,scientist,


In [29]:
df = merge_left.copy()
df.sort_values(by=['gender','job'],ascending=True,ignore_index=True)
df

Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,
5,paul,44,F,scientist,


In [30]:
df.isnull() #True for those spaces having null values

Unnamed: 0,name,age,gender,job,height
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,True
5,False,False,False,False,True


In [32]:
print(df.isnull().sum())
df.isnull()

name      0
age       0
gender    0
job       0
height    2
dtype: int64


Unnamed: 0,name,age,gender,job,height
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,True
5,False,False,False,False,True


In [34]:
print(df.notnull().sum())
df.notnull()

name      6
age       6
gender    6
job       6
height    4
dtype: int64


Unnamed: 0,name,age,gender,job,height
0,True,True,True,True,True
1,True,True,True,True,True
2,True,True,True,True,True
3,True,True,True,True,True
4,True,True,True,True,False
5,True,True,True,True,False


In [36]:
print(df.height.isnull())

0    False
1    False
2    False
3    False
4     True
5     True
Name: height, dtype: bool


In [38]:
print(df.height.isnull().sum())
print(df.height.notnull().sum())

2
4


In [40]:
print(df.height.isnull(), df.height.isnull().sum())
print(df.height.notnull(),df.height.notnull().sum())

0    False
1    False
2    False
3    False
4     True
5     True
Name: height, dtype: bool 2
0     True
1     True
2     True
3     True
4    False
5    False
Name: height, dtype: bool 4


In [41]:
df[df.height.isnull()]

Unnamed: 0,name,age,gender,job,height
4,peter,33,M,engineer,
5,paul,44,F,scientist,


In [52]:
# Strategy-1: Deleting rows containing null values
df1 = merge_left.copy()
df1.sort_values(by=['gender','job'],ascending=True,ignore_index=True)
print(df1.shape)

(6, 5)


In [45]:
df.dropna(inplace=True)
print(df.shape)

(4, 5)


In [46]:
df.dropna(how='all',inplace=True)
print(df.shape)
df

(4, 5)


Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0


In [49]:
# Strategy 2: Filling up missing values
df = merge_left.copy()
df

Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,
5,paul,44,F,scientist,


In [54]:
df = df1.copy()
print(df.height.mean())
df.fillna(df.height.mean(), inplace=True)
df

175.0


Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,175.0
5,paul,44,F,scientist,175.0


In [56]:
df = df1.copy()
print(df.height.median())
df.fillna(df.height.median(), inplace=True)
df

177.5


Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,177.5
5,paul,44,F,scientist,177.5


In [61]:
# Mode can be multimodal
df = df1.copy()
print(df.height.mode(),type(df.height.mode()))
df.fillna(df.height.mode()[0], inplace=True)
df

0    180.0
Name: height, dtype: float64 <class 'pandas.core.series.Series'>


Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,180.0
5,paul,44,F,scientist,180.0


In [62]:
df = df1.copy()
print(df)
print(df.height.mode())
df.fillna(df.height.mode()[0], inplace=True)
df

    name  age gender        job  height
0  alice   19      F    student   165.0
1   john   26      M    student   180.0
2   eric   22      M    student   175.0
3  julie   58      F    manager   180.0
4  peter   33      M   engineer     NaN
5   paul   44      F  scientist     NaN
0    180.0
Name: height, dtype: float64


Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,180.0
5,paul,44,F,scientist,180.0


In [72]:
# forward fill :The values got dropped down
df = df1.copy()
print(df)
df.fillna(method='pad', inplace=True)
df


    name  age gender        job  height
0  alice   19      F    student   165.0
1   john   26      M    student   180.0
2   eric   22      M    student   175.0
3  julie   58      F    manager   180.0
4  peter   33      M   engineer     NaN
5   paul   44      F  scientist     NaN


Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,180.0
5,paul,44,F,scientist,180.0


In [67]:
#Backward fill
df = df1.copy()
print(df)
df.fillna(method='bfill', inplace=True)
df

    name  age gender        job  height
0  alice   19      F    student   165.0
1   john   26      M    student   180.0
2   eric   22      M    student   175.0
3  julie   58      F    manager   180.0
4  peter   33      M   engineer     NaN
5   paul   44      F  scientist     NaN


Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,
5,paul,44,F,scientist,


In [71]:
print(df.columns)
df.coloumns = ['new_name' , 'new_age' , 'new_gender', 'new_job']
print(df.coloumns)
df

Index(['name', 'age', 'gender', 'job', 'height'], dtype='object')
['new_name', 'new_age', 'new_gender', 'new_job']


Unnamed: 0,name,age,gender,job,height
0,alice,19,F,student,165.0
1,john,26,M,student,180.0
2,eric,22,M,student,175.0
3,julie,58,F,manager,180.0
4,peter,33,M,engineer,
5,paul,44,F,scientist,


In [74]:
df = users.copy()
print(df.columns)
print(df)
df = df.rename(columns={"name":"new_name","age":"new_age","gender":"new_gender","job":"new_job"})
print(df.columns)
df

Index(['name', 'age', 'gender', 'job'], dtype='object')
    name  age gender        job
0  alice   19      F    student
1   john   26      M    student
2   eric   22      M    student
3  julie   58      F    manager
4  peter   33      M   engineer
5   paul   44      F  scientist
Index(['new_name', 'new_age', 'new_gender', 'new_job'], dtype='object')


Unnamed: 0,new_name,new_age,new_gender,new_job
0,alice,19,F,student
1,john,26,M,student
2,eric,22,M,student
3,julie,58,F,manager
4,peter,33,M,engineer
5,paul,44,F,scientist


In [76]:
df = users.copy()
print(df.columns)
print(df)
df = df.rename(columns={"gender":"new_gender","name":"new_name"})
print(df.columns)
df

Index(['name', 'age', 'gender', 'job'], dtype='object')
    name  age gender        job
0  alice   19      F    student
1   john   26      M    student
2   eric   22      M    student
3  julie   58      F    manager
4  peter   33      M   engineer
5   paul   44      F  scientist
Index(['new_name', 'age', 'new_gender', 'job'], dtype='object')


Unnamed: 0,new_name,age,new_gender,job
0,alice,19,F,student
1,john,26,M,student
2,eric,22,M,student
3,julie,58,F,manager
4,peter,33,M,engineer
5,paul,44,F,scientist


Group By on DataFrame

In [78]:
df = users.copy()
df

Unnamed: 0,name,age,gender,job
0,alice,19,F,student
1,john,26,M,student
2,eric,22,M,student
3,julie,58,F,manager
4,peter,33,M,engineer
5,paul,44,F,scientist


In [80]:
for group, data_frame in users.groupby('job'):
   print(type(group), type(data_frame))
   print("Group Name",group)
   print(data_frame)

<class 'str'> <class 'pandas.core.frame.DataFrame'>
Group Name engineer
    name  age gender       job
4  peter   33      M  engineer
<class 'str'> <class 'pandas.core.frame.DataFrame'>
Group Name manager
    name  age gender      job
3  julie   58      F  manager
<class 'str'> <class 'pandas.core.frame.DataFrame'>
Group Name scientist
   name  age gender        job
5  paul   44      F  scientist
<class 'str'> <class 'pandas.core.frame.DataFrame'>
Group Name student
    name  age gender      job
0  alice   19      F  student
1   john   26      M  student
2   eric   22      M  student


In [85]:
for group, data_frame in users.groupby('gender'):
   print(type(group), type(data_frame))
   print("Group Name",group)
   print(data_frame)

<class 'str'> <class 'pandas.core.frame.DataFrame'>
Group Name F
    name  age gender        job
0  alice   19      F    student
3  julie   58      F    manager
5   paul   44      F  scientist
<class 'str'> <class 'pandas.core.frame.DataFrame'>
Group Name M
    name  age gender       job
1   john   26      M   student
2   eric   22      M   student
4  peter   33      M  engineer


In [88]:
grouped_df = df.groupby('job').agg({'age':['sum','mean','max','min'],'gender':'size'})

print(grouped_df)

          age                    gender
          sum       mean max min   size
job                                    
engineer   33  33.000000  33  33      1
manager    58  58.000000  58  58      1
scientist  44  44.000000  44  44      1
student    67  22.333333  26  19      3


In [89]:
grouped_df = df.groupby('job').agg({'age':['sum','mean','max','min'],'gender':'count'})

print(grouped_df)

          age                    gender
          sum       mean max min   size
job                                    
engineer   33  33.000000  33  33      1
manager    58  58.000000  58  58      1
scientist  44  44.000000  44  44      1
student    67  22.333333  26  19      3


In [91]:
grouped_df = df.groupby('gender').agg({'age':['sum','mean','max','min'],'gender':'count'})

print(grouped_df)

        age                    gender
        sum       mean max min  count
gender                               
F       121  40.333333  58  19      3
M        81  27.000000  33  22      3


Reading Data on DataFrame from External Sources

In [93]:
try:
   print("Reading data from CSV file....")
   df = pd.read_csv("salary_table.csv")
   print(df.head())
   print(df.shape)

except:
   print("File Access ERROR !!!")
   print("Data file could not be read.....")

Reading data from CSV file....
   salary  experience education management
0   13876           1  Bachelor          Y
1   11608           1      Ph.D          N
2   18701           1      Ph.D          Y
3   11283           1    Master          N
4   11767           1      Ph.D          N
(46, 4)


In [100]:
try:
   print("Reading data from CSV file....")
   df = pd.read_csv("salary_table.csv")
   df = pd.read_csv("https://bitbucket.org/toarnabtrainer/aec_ml_python_oct_2025/raw/2480078e55b093732debc46affa8c077c3cfd734/Datafile/salary_table.csv")
   print(df.head())
   print(df.shape)

except:
   print("File Access ERROR !!!")
   print("Data file could not be read.....")

Reading data from CSV file....
   salary  experience education management
0   13876           1  Bachelor          Y
1   11608           1      Ph.D          N
2   18701           1      Ph.D          Y
3   11283           1    Master          N
4   11767           1      Ph.D          N
(46, 4)


In [101]:
pd.options.display.width = 700

In [103]:
try:
   print("Reading data from XLSX file....")
   df = pd.read_excel(".//Online Retail.xlsx",sheet_name='Sheet1')
   print(df.head())
   print(df.shape)

except:
   print("File Access ERROR !!!")
   print("Data file could not be read.....")

Reading data from XLSX file....
   InvoiceNo StockCode                          Description  Quantity         InvoiceDate  UnitPrice  CustomerID         Country
0     536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6 2010-12-01 08:26:00       2.55       17850  United Kingdom
1     536365     71053                  WHITE METAL LANTERN         6 2010-12-01 08:26:00       3.39       17850  United Kingdom
2     536365    84406B       CREAM CUPID HEARTS COAT HANGER         8 2010-12-01 08:26:00       2.75       17850  United Kingdom
3     536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6 2010-12-01 08:26:00       3.39       17850  United Kingdom
4     536403     22867              HAND WARMER BIRD DESIGN        96 2010-12-01 11:27:00       1.85       12791     Netherlands
(15, 8)
