# Jupter Notebook

In this notebook, I will show you some details about the jupyter notebook environment, helping you understand what happened when you start a jupyter notebook for analysis, and how is this different from other ways to run python.

## A Python Jupyter Notebook rely on the original python interpreter

- Whenever you run python code, you gonna need a python interpreter. Jupyter Notebook is a user friendly environment for python data analysis, but its still python, so it use the same python interpreter just like running python in other ways

In [1]:
import sys
print(sys.executable)

# You can see my python location below, from this book's conda environment

/Users/hq/miniconda3/envs/genome_book/bin/python


In [2]:
%%bash
# the %%bash means code in this cell need to be run in bash, but not python code

# Now if I execute python from the shell, its the same thing!
which python

/Users/hq/miniconda3/envs/genome_book/bin/python


## A live Jupyter Notebook is a python process

In [3]:
import os
pid = os.getpid() # this return the process id of my current python process for this jupyter notebook

import psutil  
# this package help me get the process name, 
# but its OK not to install this in your envrionment, I won't use it for analysis
process = psutil.Process(pid)
process

psutil.Process(pid=2252, name='python3.7', started='11:08:16')

## A live Jupyter Notebook contains lots of variables

- When you start a jupyter notebook, you also start a python process, which will keep alive until you kill it or quit the jupyter notebook server.
- All code you have executed creates some variables that's been recorded in the background, so you can get variable values 
- Jupyter also maintain some other vairables that's needed for describing the whole notebook
- Here is a way to confirm

In [4]:
# locals() function print out all local variables
locals()

{'__name__': '__main__',
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__package__': None,
 '__loader__': None,
 '__spec__': None,
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '_ih': ['',
  "# This stackoverflow tell me how to check the location of my current python interpreter\n\nimport sys\nprint(sys.executable)\n\n# You can see my python location below, from this book's conda environment",
  "get_ipython().run_cell_magic('bash', '', '# the %%bash means code in this cell need to be run in bash, but not python code\\n\\n# Now if I execute python from the shell, its the same thing!\\nwhich python\\n')",
  "import os\npid = os.getpid() # this return the process id of my current python process for this jupyter notebook\n\nimport psutil  \n# this package help me get the process name, \n# but its OK not to install this in your envrionment, I won't use it for analysis\nprocess = psutil.Process(pid)\npro

In [5]:
# What locals() returned is actually a dict that contain all local variables for this notebook, 
# let's take a look at something we just imported

local_dict = locals()
local_dict['sys']

# you see, the name "sys" coresponding to a python module that we have imported from the cell above

<module 'sys' (built-in)>

In [6]:
# now I create some other variables
a = 1
b = 2
c = 3

In [7]:
# they are all in the locals, we can get variable from the environment
local_dict = locals()
print(local_dict['a'])
print(local_dict['b'])
print(local_dict['c'])

1
2
3


In [8]:
# of course, you don't need locals() to get them, I was just telling you the background details
# the actual way is
print(a)
print(b)
print(c)

1
2
3


## Here is a magic command to list all variables

- When you see something start with % in the first line of a cell, this is called ipython magic command
- Jupter Notebook rely on a package called ipython, which implemented all these magic command to increase your coding efficiency
- [This page](https://ipython.readthedocs.io/en/stable/interactive/magics.html) listed all supported magic commands, we will use some of them in the future

In [9]:
%whos

# it's OK if yours are not exactly the same as mine, but you should at least see the a, b, c, local_dict, and sys

Variable        Type        Data/Info
-------------------------------------
a               int         1
b               int         2
c               int         3
json            module      <module 'json' from '/Use<...>hon3.7/json/__init__.py'>
local_dict      dict        n=44
os              module      <module 'os' from '/Users<...>ook/lib/python3.7/os.py'>
pid             int         2252
process         Process     psutil.Process(pid=2252, <...>3.7', started='11:08:16')
psutil          module      <module 'psutil' from '/U<...>ages/psutil/__init__.py'>
sys             module      <module 'sys' (built-in)>
yapf_reformat   function    <function yapf_reformat at 0x10eafaa70>


In [10]:
# Now let's import pandas as pd, pd become the alias of pandas
import pandas as pd

In [11]:
%whos

# The module pd is added into local variables, 
# we can use pd.DataFrame to create data frame or pd.read_csv to read csv files

# whos help you hide many jupyter defalut variables, here it only shows all the variables defined by me

Variable        Type        Data/Info
-------------------------------------
a               int         1
b               int         2
c               int         3
json            module      <module 'json' from '/Use<...>hon3.7/json/__init__.py'>
local_dict      dict        n=47
os              module      <module 'os' from '/Users<...>ook/lib/python3.7/os.py'>
pd              module      <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
pid             int         2252
process         Process     psutil.Process(pid=2252, <...>3.7', started='11:08:16')
psutil          module      <module 'psutil' from '/U<...>ages/psutil/__init__.py'>
sys             module      <module 'sys' (built-in)>
yapf_reformat   function    <function yapf_reformat at 0x10eafaa70>


In [12]:
# use functions from the pd module
sample_df = pd.read_csv('../file_io/sample_table.csv')

In [13]:
# name pandas will not work, because its not in the local variable
sample_df = pandas.read_csv('../file_io/sample_table.csv')

NameError: name 'pandas' is not defined

In [14]:
# directly use the function name also won't work, because its not in the local variable
sample_df = read_csv('../file_io/sample_table.csv')

NameError: name 'read_csv' is not defined

In [15]:
# But we can import them like this, so it will work
from pandas import read_csv  # this means I took this function from the module, import into my local variables

sample_df = read_csv('../file_io/sample_table.csv')

In [16]:
%whos

# see whats in the local variables now, read_csv is added!

Variable        Type         Data/Info
--------------------------------------
a               int          1
b               int          2
c               int          3
json            module       <module 'json' from '/Use<...>hon3.7/json/__init__.py'>
local_dict      dict         n=54
os              module       <module 'os' from '/Users<...>ook/lib/python3.7/os.py'>
pd              module       <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
pid             int          2252
process         Process      psutil.Process(pid=2252, <...>3.7', started='11:08:16')
psutil          module       <module 'psutil' from '/U<...>ages/psutil/__init__.py'>
read_csv        function     <function _make_parser_fu<...>.parser_f at 0x11d663dd0>
sample_df       DataFrame         Unnamed: 0  sepal_le<...>n\n[150 rows x 6 columns]
sys             module       <module 'sys' (built-in)>
yapf_reformat   function     <function yapf_reformat at 0x10eafaa70>


## One last thing - What's in a python module

- When you import pandas as pd, you open the pandas module and use the pd name as a entry point
- And when you say pd.read_csv(), python interpreter go into pd module to look for a function named read_csv
- The dir() function allows you check all avaliable methods within a python module

In [17]:
dir(pd)

['BooleanDtype',
 'Categorical',
 'CategoricalDtype',
 'CategoricalIndex',
 'DataFrame',
 'DateOffset',
 'DatetimeIndex',
 'DatetimeTZDtype',
 'ExcelFile',
 'ExcelWriter',
 'Float64Index',
 'Grouper',
 'HDFStore',
 'Index',
 'IndexSlice',
 'Int16Dtype',
 'Int32Dtype',
 'Int64Dtype',
 'Int64Index',
 'Int8Dtype',
 'Interval',
 'IntervalDtype',
 'IntervalIndex',
 'MultiIndex',
 'NA',
 'NaT',
 'NamedAgg',
 'Period',
 'PeriodDtype',
 'PeriodIndex',
 'RangeIndex',
 'Series',
 'SparseDtype',
 'StringDtype',
 'Timedelta',
 'TimedeltaIndex',
 'Timestamp',
 'UInt16Dtype',
 'UInt32Dtype',
 'UInt64Dtype',
 'UInt64Index',
 'UInt8Dtype',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__docformat__',
 '__file__',
 '__getattr__',
 '__git_version__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_config',
 '_hashtable',
 '_is_numpy_dev',
 '_lib',
 '_libs',
 '_np_version_under1p14',
 '_np_version_under1p15',
 '_np_version_under1p16',
 '_np_version_under1p17',
 '_

- You don't need to remember the whole list, you won't need many of them also
- This is just helping you understand what happened when you import something into python

## Take home message

- Jupyter Notebook uses python interpreter to run python code, just like other ways to run python
- A live Jupyter Notebook is a python process
- All python variables and modules you created/imported are in the local variables, you can create from one cell, use them in the other cell