# Code Club 001: introduction to Jupyter notebooks


In [13]:
import os
import time

code_club_rule = True
count = 1

while code_club_rule == True:

    if str(count)[-1] == '1':
        ordinal_ind = 'st'
    elif str(count)[-1] == '2':
        ordinal_ind = 'nd'
    elif str(count) == '3':
        ordinal_ind = 'rd'
    else:
        ordinal_ind = 'th'

    print(f'The {count}{ordinal_ind} rule of Code Club is...')
    print("... always talk about Code Club!!!")

    count += 1
    time.sleep(4)

    if count == 10:
        code_club_rule = False

The 1st rule of Code Club is...
... always talk about Code Club!!!
The 2nd rule of Code Club is...
... always talk about Code Club!!!
The 3rd rule of Code Club is...
... always talk about Code Club!!!
The 4th rule of Code Club is...
... always talk about Code Club!!!
The 5th rule of Code Club is...
... always talk about Code Club!!!
The 6th rule of Code Club is...
... always talk about Code Club!!!
The 7th rule of Code Club is...
... always talk about Code Club!!!
The 8th rule of Code Club is...
... always talk about Code Club!!!
The 9th rule of Code Club is...
... always talk about Code Club!!!


## Jupyter Notebooks

Jupyter a not-for-profit project to develop open-source software and interactive computing across multiple programming languages

  - The name Jupyter is derived from the three core programming languages supported: Julia, Python and R

Jupyter's most famous product is it's *Notebook* interface, which allows users to combine text and code cells:

  - Text cells are written in `markdown` which is a simple markup language that should be familiar to anyone that's written for a wiki **n.b.** it's also supported in Teams
    - here's a guide for it's use: [https://www.markdownguide.org/](https://www.markdownguide.org/)

  - Code cells can be written in `Julia`, `python` and `R`, but the most common language used is `python`

In 2017, Jupyter Notebooks won the prestigious ACM Software System Award [https://blog.jupyter.org/jupyter-receives-the-acm-software-system-award-d433b0dfe3a2](https://blog.jupyter.org/jupyter-receives-the-acm-software-system-award-d433b0dfe3a2)

There are a range of different platforms that allow you to create and distribute notebooks via the internet, including [Kaggle.com](https://www.kaggle.com/) and [Google Colab](https://colab.google/)
Notebooks and these platforms tend to be very popular amongst Data Scientists and are used for prototyping, sharing experiments and also because they offer free access to GPUs (very useful for machine learning/AI)
Jupyter Notebooks can also be used to write presentations, e.g. [RISE](https://rise.readthedocs.io/en/latest/) and even write books e.g. [Deep Learning for Coders with fastai and PyTorch](https://github.com/fastai/fastbook)

## Kaggle

As mentioned above, Kaggle is a platform for creating, editing and sharing Jupyter Notebooks. It also provides a platform for hosting competitions (usually AI/machine learning) and to make datasets available e.g. Seattle Library's Collection Inventory [https://www.kaggle.com/datasets/city-of-seattle/seattle-library-collection-inventory](https://www.kaggle.com/datasets/city-of-seattle/seattle-library-collection-inventory)

For today's Code Club session, I have created a Notebook in Kaggle - link shared via Teams

Once you have the link open, and you are signed into Kaggle, you should get the option to *Copy & Edit* in the top right-hand side of the page:
    
   - clicking that will save a version of the notebook to your profile and allow you to execute the code, add, remove and edit sections and access it wherever you have an internet connection

The notebook is designed to be a whistle-stop tour through `python`, `pandas` and `plotly`. These are all huge topics, but this notebook takes inspiration from Fastai's [Jeremy Howard](https://jeremy.fast.ai/) and uses a top-down approach. An analogy to explain this is to imagine learning to play a sport or game as a child: the first step isn't to learn all of the rules and their intricacies, it's to play! And through doing so, the game and it's rules begins to reveal themselves and make sense. 

In that spirit, this notebook was written to get you from (potentially) knowing no python to reading data from a file and creating plots of the data, using tools that are free, flexible and open to virtually endless experimentation. If when you read through this notebook things don't immediately make sense, then don't worry too much. For each section, I've linked to other resources and tutorials which go into more depth about the topic and should help deepen your understanding.

## Python

Python is named in reference to Monty Python. This is important because it's indicative of the mindset of lots of Python developers. Often tutorials and documentation contain reference to the eponymous comedy troupe:

  - the official input and output documentation is littered with references to *Monty Python and the Holy Grail* [https://docs.python.org/3.8/tutorial/inputoutput.html#the-string-format-method](https://docs.python.org/3.8/tutorial/inputoutput.html#the-string-format-method)
  - the first chapter of [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/2e/chapter1/) contains 44 instances of [spam](https://youtu.be/anwy2MPT5RE?si=g0Y-jedgdwK-oyyp)

The community of Python developers pride themselves on doing things the 'pythonic' way. There is an entire code writing style guide [PEP 8](https://peps.python.org/pep-0008/) that goes into fine detail. The key underlying principles of being *pythonic* can be summarised as follows:

  - code should be written in a readable manner
  - explicit is better than implicit
  - simple is better than complex

This approach makes Python easier to learn, improves maintainability and generally makes understanding others (and sometimes your own) code easier

By way of an example here is a comparison between JavaScript and Python: both code snippets will define a list of pet names, loop through them and display them to the user. 

**JavaScript:** has a more verbose syntax and can be harder to read

  ```js
  const pets = ['Dave the dog', 'Sammy the snake','Leo the lion','Kozzy the kangaroo'];

  for (let i=0; i < pets.length; i++) {
    console.log(pets[i]);
  }
  ```



**Python:** has a less busy syntax and reads more like plain English:

In [4]:
pets = ['Dave the dog', 'Sammy the snake','Leo the lion','Kozzy the kangaroo']

for pet in pets:
    print(pet)

Dave the dog
Sammy the snake
Leo the lion
Kozzy the kangaroo


### Python Basics & Conventions

Python follows a series of conventions, many of which are observed by most other programming languages. They are important to get a grasp of in order to help you understand how python and this notebook works. In the code samples there may be parts of the specific syntax that you don't understand (yet!) but that isn't too important at the moment, but getting to grips with the broader principles is!

#### 1. Scripts are read and executed from top-to-bottom

Much like English, computers "read" Python left-to-right and top-to-bottom e.g. the code block below will cause an error to be returned 

In [6]:
carrots = .50
peas = .25
potatoes = .25

sub_total = carrots + peas + asparagus + potatoes # this line will cause an error

asparagus = 2.00

# sub_total = carrots + peas + asparagus + potatoes 


NameError: name 'asparagus' is not defined

**N.B.** In the code block above, the `#` symbol is used to indicate a comment on the 5th line. On the final line it's used to "comment out" the code, this is common practice praticualrly when you're writing and figuring out the code. Kaggle allows you to toggle between commented and uncommented code with the shortcut "ctrl + /" 

#### 2. Indentation is important

Python allows the programmer to define *blocks* of code that can be optionally executed if particular conditions are met. These blocks are identified by the use of indentation, precisely 4 spaces. Blocks can also be nestled within other blocks.

In Python the simplest way of controlling the flow is the use of `if`, `elif` (*i.e.* else if) and `else` statements

e.g. a simple temperature control system. Try changing the temperature variable to different numerical values to see how the outcome is affected

In [10]:
temperature = 11

if temperature < 20:
    # note the indentation
    print('Temperature is below 20, close the windows')
    if temperature < 10:
        print('Turn up the heating')

elif temperature < 30:
    print('Open the windows!')

else:
    print('Turn on the AC!')


Temperature is below 20, close the windows


**N.B.** most code editors (including Kaggle) will recognise that you're writing Python and automatically indent the required 4 spaces when appropriate.

#### 3. Python is very literal and precise

I once heard writing code described as *"teaching a rock how to think"*. To that end a python script will only do what you've instructed it to do. Sometimes you may think you've told it to do something and the results suggest otherwise. Which can be (very) frustrating, but unfortunately it's usually our fault!

Because the Python script is interpreted very literally, it is important to be precise. This includes ways in which our brains have often learned to ignore or be flexible with e.g. capitalisation

In [11]:
my_fav_team = 'Manchester United'

your_fav_team = 'Manchester united'

print('Do we have the same favourite team? ', my_fav_team == your_fav_team)

Do we have the same favourite team?  False


Common sense may tell us that `my_fav_team` and `your_fav_team` are referring to the same thing, but the literal nature of Python means that the difference in capitalisation of the two *strings* is significant. In practice python offers some easy ways to resolve this:

e.g. the `.lower()` method makes everything in the string lowercase, allowing for a more flexible comparison:

In [12]:
my_fav_team = 'Manchester United'

your_fav_team = 'Manchester united'

print('Do we have the same favourite team? ', my_fav_team.lower() == your_fav_team.lower())

Do we have the same favourite team?  True


Python's literal-ness extends to data types too. Simple data types in python fall into the following categories:

  1. `strings` - any collection of characters that are encompassed between either two `'` or two `"` characters
  2. `ints` - any whole number, positive or negative
  3. `floats` - any number with decimal points
  4. `bool` - always either `True` or `False` (note the capitalisation)

e.g. when does `'2' + '2'` not equal `5`?

In [None]:
x = '2'
y = '2'

print(x + y)

Again, our common sense may tell us to expect the output to be `4` however python is interpreting the `x` and `y` variables as strings, becsaue we've told it to by using `'` around each value.

Here's two ways to fix it:

In [14]:

# Method 1: casting the variables are ints
x = '2'
y = '2'

print(int(x) + int(y))


# Method 2: creating the variables as ints to begin with
x = 2
y = 2

print(x + y)

Method 1: casting the variables are ints
4
Method 2: creating the variables as ints to begin with
4


## Pandas

Another section, another silly name. **PANDAS** is a portmanteau of an economics term: **PAN**el **DA**ta (with an **S** added for good measure). It is a vast and very well-used Python package and is most useful for working with lots of data.

At the core of Pandas is the concept of the `DataFrame` which is essentially a representation of data in table form (similar to Excel or a database). The `DataFrame` consists of rows of data, organised into columns with an index. 

The thing that makes Pandas so useful is that it provides lots of functionality to understand, shape and transform your data. As if that wasn't enough, it also has out-of-the-box support for reading and writing data to a variety of file formats including excel and csv, as well as built in sql database connectors.

In this example, a DataFrame is constructed from a three-dimensional array. This is a toy example and is only intended to give an indication of what a DataFrame is.

In [3]:
import pandas as pd

sample_array = [
    ['A',1,2.0],
    ['B',1,2.1],
    ['C',1,2.2]
]

df = pd.DataFrame(sample_array)

df

Unnamed: 0,0,1,2
0,A,1,2.0
1,B,1,2.1
2,C,1,2.2


Pandas also allows us to easily read data from a csv file.

I have created a sample dataset that contains some sample data related to the Library's visitor numbers and is saved in a file called `slv_vemcount_wk1_jun_2022.csv`. The rest of the Pandas section works with this dataset to demonstrate some features that allow us to better understand our data.

To begin with we'll create variable that tells our code where the csv file is stored, then use `read_csv` to open and display the data.

**n.b.** `header=0` tells pandas that the first row contains the column headers.

In [75]:
import os

current_dir = os.getcwd()
path_to_csv_file = os.path.join(current_dir,'data','slv_vemcount_wk1_jun_2022.csv')

df = pd.read_csv(path_to_csv_file,header=0)

df

Unnamed: 0,name,count_in,count_out,inside,date,time
0,Defunct,27,17,10,2022-06-07,08:00:00
1,Defunct,23,21,2,2022-06-06,08:00:00
2,Defunct,127,19,118,2022-06-07,08:30:00
3,Defunct,37,123,32,2022-06-07,09:00:00
4,Defunct,49,131,-50,2022-06-07,09:30:00
...,...,...,...,...,...,...
19987,Zone - All External Entries to Building Child,0,14,-14,2022-06-12,17:30:00
19988,Zone - All External Entries to Building Child,4,7,-3,2022-06-12,18:00:00
19989,Zone - All External Entries to Building Child,0,6,-6,2022-06-12,18:30:00
19990,Zone - All External Entries to Building Child,0,3,-3,2022-06-12,19:00:00


Now that we have the csv loaded to a DataFrame there are lots of handy methods that can help us understand our data.

e.g. `columns` will return a list of the column headers

In [76]:
df.columns

Index(['name', 'count_in', 'count_out', 'inside', 'date', 'time'], dtype='object')

`shape` returns a the number of rows and columns in the following format `(rows,columns)`

In [77]:
df.shape

(19992, 6)

`info()` will summarise each column, tell us how many rows in each column are not empty (non-null) and the "Dtype". Dtype is the data type that pandas has inferred for each column e.g. whole numbers will be represented as `int64` 

In [78]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19992 entries, 0 to 19991
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   name       19992 non-null  object
 1   count_in   19992 non-null  int64 
 2   count_out  19992 non-null  int64 
 3   inside     19992 non-null  int64 
 4   date       19992 non-null  object
 5   time       19992 non-null  object
dtypes: int64(3), object(3)
memory usage: 937.2+ KB


`describe()` will return statistical summaries for each of the columns that have a numerical Dtype

In [79]:
df.describe()

Unnamed: 0,count_in,count_out,inside
count,19992.0,19992.0,19992.0
mean,164.752001,140.679772,26.407063
std,2805.870132,1848.208015,1426.835554
min,0.0,0.0,-38404.0
25%,0.0,0.0,-1.0
50%,4.0,4.0,0.0
75%,28.0,28.0,2.0
max,237117.0,134102.0,141014.0


There are several ways to select specific column(s) from a DataFrame. Here's the simplest:

In [80]:
df['name']

0                                              Defunct
1                                              Defunct
2                                              Defunct
3                                              Defunct
4                                              Defunct
                             ...                      
19987    Zone - All External Entries to Building Child
19988    Zone - All External Entries to Building Child
19989    Zone - All External Entries to Building Child
19990    Zone - All External Entries to Building Child
19991    Zone - All External Entries to Building Child
Name: name, Length: 19992, dtype: object

You can also filter rows according to their values. This example returns all rows that are not equal to "Defunct"

**N.B.** `!=` is the python way of indicating not equal

In [81]:
df[df['name'] != 'Defunct']

Unnamed: 0,name,count_in,count_out,inside,date,time
25,Dome Gallery L4 Lift exit/entry test,0,0,0,2022-06-07,08:00:00
26,Dome Gallery L4 Lift exit/entry test,0,0,0,2022-06-07,08:30:00
27,Dome Gallery L4 Lift exit/entry test,0,0,0,2022-06-07,09:00:00
28,Dome Gallery L4 Lift exit/entry test,0,0,0,2022-06-07,09:30:00
29,Dome Gallery L4 Lift exit/entry test,0,0,0,2022-06-07,10:00:00
...,...,...,...,...,...,...
19987,Zone - All External Entries to Building Child,0,14,-14,2022-06-12,17:30:00
19988,Zone - All External Entries to Building Child,4,7,-3,2022-06-12,18:00:00
19989,Zone - All External Entries to Building Child,0,6,-6,2022-06-12,18:30:00
19990,Zone - All External Entries to Building Child,0,3,-3,2022-06-12,19:00:00


Other ways of filtering include:
  - equal to `==`
  - less than: `<`
  - more than `>`
  - less than or equal to `<=`
  - more than or equal to `>=`

In [82]:
df[df['inside'] > 0]

Unnamed: 0,name,count_in,count_out,inside,date,time
0,Defunct,27,17,10,2022-06-07,08:00:00
1,Defunct,23,21,2,2022-06-06,08:00:00
2,Defunct,127,19,118,2022-06-07,08:30:00
3,Defunct,37,123,32,2022-06-07,09:00:00
5,Defunct,26343,3036,23307,2022-06-07,10:00:00
...,...,...,...,...,...,...
19978,Zone - All External Entries to Building Child,20,10,10,2022-06-12,13:00:00
19980,Zone - All External Entries to Building Child,16,13,3,2022-06-12,14:00:00
19981,Zone - All External Entries to Building Child,16,14,2,2022-06-12,14:30:00
19982,Zone - All External Entries to Building Child,17,14,3,2022-06-12,15:00:00


It is also easy to add new columns to  a DataFrame. Here's how to add a new column where all the values are the same:

In [83]:
df['institution'] = 'State Library Victoria'

df['institution']

0        State Library Victoria
1        State Library Victoria
2        State Library Victoria
3        State Library Victoria
4        State Library Victoria
                  ...          
19987    State Library Victoria
19988    State Library Victoria
19989    State Library Victoria
19990    State Library Victoria
19991    State Library Victoria
Name: institution, Length: 19992, dtype: object

You can also add a ew column with dynamic values e.g. one that concatenates the 'name' and 'institution' columns

In [84]:
df['verbose name'] = df['name'] + ' ' + df['institution']
df

Unnamed: 0,name,count_in,count_out,inside,date,time,institution,verbose name
0,Defunct,27,17,10,2022-06-07,08:00:00,State Library Victoria,Defunct State Library Victoria
1,Defunct,23,21,2,2022-06-06,08:00:00,State Library Victoria,Defunct State Library Victoria
2,Defunct,127,19,118,2022-06-07,08:30:00,State Library Victoria,Defunct State Library Victoria
3,Defunct,37,123,32,2022-06-07,09:00:00,State Library Victoria,Defunct State Library Victoria
4,Defunct,49,131,-50,2022-06-07,09:30:00,State Library Victoria,Defunct State Library Victoria
...,...,...,...,...,...,...,...,...
19987,Zone - All External Entries to Building Child,0,14,-14,2022-06-12,17:30:00,State Library Victoria,Zone - All External Entries to Building Child ...
19988,Zone - All External Entries to Building Child,4,7,-3,2022-06-12,18:00:00,State Library Victoria,Zone - All External Entries to Building Child ...
19989,Zone - All External Entries to Building Child,0,6,-6,2022-06-12,18:30:00,State Library Victoria,Zone - All External Entries to Building Child ...
19990,Zone - All External Entries to Building Child,0,3,-3,2022-06-12,19:00:00,State Library Victoria,Zone - All External Entries to Building Child ...


There are lots of ways that Pandas can help to summarise data too. 

In [85]:
columns = ['verbose name','time','count_in']
grouped_df = df[columns]
grouped_df = grouped_df.groupby(['verbose name','time']).sum()

grouped_df

Unnamed: 0_level_0,Unnamed: 1_level_0,count_in
verbose name,time,Unnamed: 2_level_1
Defunct State Library Victoria,08:00:00,16867
Defunct State Library Victoria,08:30:00,4712
Defunct State Library Victoria,09:00:00,782
Defunct State Library Victoria,09:30:00,864
Defunct State Library Victoria,10:00:00,170825
...,...,...
Zone - All External Entries to Building Child State Library Victoria,17:30:00,24
Zone - All External Entries to Building Child State Library Victoria,18:00:00,31
Zone - All External Entries to Building Child State Library Victoria,18:30:00,9
Zone - All External Entries to Building Child State Library Victoria,19:00:00,8


Finally, let's combine a few of these techniques to create a new DataFrame that can be exported using the `to_csv()` method

In [89]:
columns_for_export = ['verbose name','date','count_in']
current_dir = os.getcwd()
csv_export_path = os.path.join(current_dir,'data','grouped_slv_vemcount.csv')

df_for_export = df[df['name'] != 'Defunct']
df_for_export = df_for_export[columns_for_export]
df_for_export = df_for_export.groupby(['verbose name','date']).sum()
df_for_export.to_csv(csv_export_path)
df_for_export

Unnamed: 0_level_0,Unnamed: 1_level_0,count_in
verbose name,date,Unnamed: 2_level_1
Dome Gallery L4 Lift exit/entry test State Library Victoria,2022-06-06,0
Dome Gallery L4 Lift exit/entry test State Library Victoria,2022-06-07,0
Dome Gallery L4 Lift exit/entry test State Library Victoria,2022-06-08,0
Dome Gallery L4 Lift exit/entry test State Library Victoria,2022-06-09,0
Dome Gallery L4 Lift exit/entry test State Library Victoria,2022-06-10,0
...,...,...
Zone - All External Entries to Building Child State Library Victoria,2022-06-08,191
Zone - All External Entries to Building Child State Library Victoria,2022-06-09,231
Zone - All External Entries to Building Child State Library Victoria,2022-06-10,213
Zone - All External Entries to Building Child State Library Victoria,2022-06-11,208


## Plotly

Plotly is an open source graphing library. It began as a JavaScript library but has been adapted so that it interfaces with many different coding languages, including Python! The Python integration also interfaces with Pandas very nicely, so you can prepare your data in a Pandas DataFrame and then create some very nifty graphs with relative ease.

Plotly is not part of the Python standard library meaning that before we can use it, it needs to be installed. The most common method for installing Python packages is through `pip install <package name>` run as a *bash* command. In Jupyter *bash* commands are indicated through the use of `!` at the beginning of the code block. e.g. see below

In [14]:
! pip install plotly




[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip


*n.b.* `pip` stands for *"pip installs packages"* which is a recursive name, because the pip at the beginning stands for *"pip installs packages"* another fine example of Python programming humour.

## Scatter plot example

The following are all adapted from the *Basic Charts > Scatter Plots* Plotly docs page [https://plotly.com/python/line-and-scatter/](https://plotly.com/python/line-and-scatter/)

*n.b.* the *Plotly* docs will often make reference to their framework *Dash* - this is a more complete data analytics tool, similar to Power BI and a bit beyond the scope of this notebook.

The first plot below shows how simple it is to construct a scatter chart. The `x` and `y` variables contain lists that will be plotted.

It's worth noting that all the examples follow common naming conventions e.g. the figure to be plotted will be created in a variable called `fig`, this is another example of the Python community embracing readability

In [15]:
import plotly.express as px
fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])
fig.show()

When you install Plotly, you are also provided with sample datasets [https://github.com/plotly/datasets](https://github.com/plotly/datasets)

The following example uses one of these datasets `iris` that is loaded as a Pandas DataFrame

In [16]:
import plotly.express as px
df = px.data.iris()

Before plotting anything, it can be useful/wise to use some Pandas methods we explored above to understand the data a bit better e.g.

  - `df.columns` to see the column names
  - `df.shape` to see the height and width of the data
  - `df.describe()` to give a statistical summary of the data
  - `df.info()` to describe the data type and *non-null* count for each column

In [17]:
print(df.columns)
print(df.describe())
print(df.info())
print(df.shape)

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species',
       'species_id'],
      dtype='object')
       sepal_length  sepal_width  petal_length  petal_width  species_id
count    150.000000   150.000000    150.000000   150.000000  150.000000
mean       5.843333     3.054000      3.758667     1.198667    2.000000
std        0.828066     0.433594      1.764420     0.763161    0.819232
min        4.300000     2.000000      1.000000     0.100000    1.000000
25%        5.100000     2.800000      1.600000     0.300000    1.000000
50%        5.800000     3.000000      4.350000     1.300000    2.000000
75%        6.400000     3.300000      5.100000     1.800000    3.000000
max        7.900000     4.400000      6.900000     2.500000    3.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    flo

Now we know a bit more about the *iris* data we can assign the column names to the `x` and `y` variables in the construction of the figure

In [18]:
fig = px.scatter(df, x="sepal_width", y="sepal_length")
fig.show()

In [19]:
fig = px.scatter(df, x="petal_width", y="petal_length")
fig.show()

Finally, more *keyword arguments* can be passed to the `fig` variable to further enrich the plot:

  - `size` is set to *petal_length* and will determine the size of the dot
  - `color` is set to *species* and will assign different colours to each species
  - `hover_data` is set to *petal_width* and will include that in the information displayed when each plot point is hovered over

This is a relatively small selection of the ways in which the scatter plot can be customised, the full documentation lists all of the options [https://plotly.com/python/reference/scatter/](https://plotly.com/python/reference/scatter/)

In [20]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
                 size='petal_length', hover_data=['petal_width'])
fig.show()

## Things to give a go yourself...

- try visualising some of the data from `slv_vemcount_wk1_jun_2022.csv` file using plotly
- if you work with data in excel or csv format, try using pandas to read and transform the data

### Useful resources

#### Python
  - Kaggle learn Python pathway [https://www.kaggle.com/learn/python](https://www.kaggle.com/learn/python)

#### Pandas
  - 10 mins to Pandas [https://pandas.pydata.org/docs/user_guide/10min.html](https://pandas.pydata.org/docs/user_guide/10min.html)
  - Datacamp Python pandas tutorial for beginners [https://www.datacamp.com/tutorial/pandas](https://www.datacamp.com/tutorial/pandas)

#### Plotly
  - plotly official docs [https://plotly.com/python/](https://plotly.com/python/)
  - Kaggle Plotly tutorial for beginners [https://www.kaggle.com/code/kanncaa1/plotly-tutorial-for-beginners](https://www.kaggle.com/code/kanncaa1/plotly-tutorial-for-beginners) 