<a href="https://colab.research.google.com/github/faro7ah/python_project/blob/main/practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<table style="float:left; border:none">
   <tr style="border:none">
       <td style="border:none">
           <a href="https://bokeh.org/">     
           <img 
               src="https://github.com/daniel-dc-cd/data_science/blob/master/daily_materials/bokeh-notebooks/tutorial/assets/bokeh-transparent.png?raw=1" 
               style="width:50px"
           >
           </a>    
       </td>
       <td style="border:none">
           <h1>Bokeh Tutorial</h1>
       </td>
   </tr>
</table>

<div style="float:right;"><h2>01. Basic Plotting</h2></div>

This section of the tutorial covers the [`bokeh.plotting`](https://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html) 
interface. This interface is a "mid-level" interface, and the main idea can be described by the statement:

**Starting from simple default figures (with sensible default tools, grids and axes), add markers and other shapes whose visual attributes are tied to directly data.**

We will see that it is possible to customize and change all of the defaults, but having them means that it is possible to get up and running very quickly. 

# Imports and Setup

When using the [`bokeh.plotting`](https://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html) interface, there are a few common imports:
* Use the [`figure`](https://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh.plotting.figure) function to  create new plot objects to work with. 
* Call the functions [`output_file`](https://bokeh.pydata.org/en/latest/docs/reference/resources_embedding.html#bokeh.io.output_file) or [`output_notebook`](https://bokeh.pydata.org/en/latest/docs/reference/resources_embedding.html#bokeh.io.output_notebook) (possibly in combination) to tell Bokeh how to display or save output. 
* Execute [`show`](https://bokeh.pydata.org/en/latest/docs/reference/resources_embedding.html#bokeh.io.show) and  [`save`](https://bokeh.pydata.org/en/latest/docs/reference/resources_embedding.html#bokeh.io.save) to display or save plots and layouts.

In [1]:
import numpy as np # we will use this later, so import it now
import pandas as pd
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

In this case, we are in the Jupyter notebook, so we will call `output_notebook()` below. We only need to call this once, and all subsequent calls to `show()` will display inline in the notebook.

In [2]:
output_notebook()

If everything is working, you should see a Bokeh logo and a message like *\"BokehJS 1.4.0 successfully loaded."* as the output. 

This notebook uses Bokeh sample data. If you haven't downloaded it already, this can be downloaded by running the following:

In [3]:
import bokeh.sampledata
bokeh.sampledata.download()

Using data directory: /root/.bokeh/data
Skipping 'CGM.csv' (checksum match)
Skipping 'US_Counties.zip' (checksum match)
Skipping 'us_cities.json' (checksum match)
Skipping 'unemployment09.csv' (checksum match)
Skipping 'AAPL.csv' (checksum match)
Skipping 'FB.csv' (checksum match)
Skipping 'GOOG.csv' (checksum match)
Skipping 'IBM.csv' (checksum match)
Skipping 'MSFT.csv' (checksum match)
Skipping 'WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip' (checksum match)
Skipping 'gapminder_fertility.csv' (checksum match)
Skipping 'gapminder_population.csv' (checksum match)
Skipping 'gapminder_life_expectancy.csv' (checksum match)
Skipping 'gapminder_regions.csv' (checksum match)
Skipping 'world_cities.zip' (checksum match)
Skipping 'airports.json' (checksum match)
Skipping 'movies.db.zip' (checksum match)
Skipping 'airports.csv' (checksum match)
Skipping 'routes.csv' (checksum match)
Skipping 'haarcascade_frontalface_default.xml' (checksum match)


# Read csv data



In [4]:
from google.colab import files
uploaded = files.upload()

Saving world-cup-data.xlsx to world-cup-data (1).xlsx


In [5]:
df = pd.read_excel("world-cup-data.xlsx")

In [6]:
df.head()

Unnamed: 0,game_id,attendance,team,tie,pk,stage,round,year,date,time,stadium,home,lat,long,referee,booked,url,goals,team_num
0,1,25000,Italy,False,False,FIRST ROUND,False,1934,27-05-1934 (16:00 h),16:00,Stadio Nazionale del PNF (Roma),Italy,41.926953,12.472197,RenÃÂ© Mercet (SWI),,1934_ITALY_FS.htm,7,1
1,1,25000,USA,False,False,FIRST ROUND,False,1934,27-05-1934 (16:00 h),16:00,Stadio Nazionale del PNF (Roma),Italy,41.926953,12.472197,RenÃÂ© Mercet (SWI),,1934_ITALY_FS.htm,1,2
2,2,16000,Austria,False,False,FIRST ROUND,False,1934,27-05-1934 (16:30 h),16:30,Benito Mussolini (Torino),Italy,45.066251,7.691228,John van Moorsel (NED),,1934_ITALY_FS.htm,3,1
3,2,16000,France,False,False,FIRST ROUND,False,1934,27-05-1934 (16:30 h),16:30,Benito Mussolini (Torino),Italy,45.066251,7.691228,John van Moorsel (NED),,1934_ITALY_FS.htm,2,2
4,3,8000,Germany,False,False,FIRST ROUND,False,1934,27-05-1934 (16:30 h),16:30,Giovanni Berta (Firenze),Italy,44.275234,11.722625,Francesco Mattea (ITA),,1934_ITALY_FS.htm,5,1


In [7]:
df.tail()

Unnamed: 0,game_id,attendance,team,tie,pk,stage,round,year,date,time,stadium,home,lat,long,referee,booked,url,goals,team_num
1667,835,57000,Austria,False,False,1/2 FINAL,False,1954,30-06-1954 (18:00 h),18:00,St. Jakob (Basel),Switzerland,47.5422544,7.6100127,Vincenzo Orlandini (ITA),,1954_SWITZERLAND_FS.htm,1,2
1668,836,32000,Austria,False,False,PLACES 3-4,False,1954,3-07-1954 (17:00 h),17:00,Hardturm (ZÃÂ¼rich),Switzerland,47.393411,8.503633,Paul Wyssling (SWI),,1954_SWITZERLAND_FS.htm,3,1
1669,836,32000,Uruguay,False,False,PLACES 3-4,False,1954,3-07-1954 (17:00 h),17:00,Hardturm (ZÃÂ¼rich),Switzerland,47.393411,8.503633,Paul Wyssling (SWI),,1954_SWITZERLAND_FS.htm,1,2
1670,837,62472,FRG,False,False,FINAL ROUND,False,1954,4-07-1954 (17:00 h),17:00,Wankdorf (Bern),Switzerland,46.963112,7.464874,William Ling (ENG),,1954_SWITZERLAND_FS.htm,3,1
1671,837,62472,Hungary,False,False,FINAL ROUND,False,1954,4-07-1954 (17:00 h),17:00,Wankdorf (Bern),Switzerland,46.963112,7.464874,William Ling (ENG),,1954_SWITZERLAND_FS.htm,2,2


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1672 entries, 0 to 1671
Data columns (total 19 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   game_id     1672 non-null   int64 
 1   attendance  1672 non-null   int64 
 2   team        1672 non-null   object
 3   tie         1672 non-null   bool  
 4   pk          1672 non-null   object
 5   stage       1672 non-null   object
 6   round       1672 non-null   object
 7   year        1672 non-null   int64 
 8   date        1672 non-null   object
 9   time        1672 non-null   object
 10  stadium     1672 non-null   object
 11  home        1672 non-null   object
 12  lat         1672 non-null   object
 13  long        1672 non-null   object
 14  referee     1672 non-null   object
 15  booked      576 non-null    object
 16  url         1672 non-null   object
 17  goals       1672 non-null   int64 
 18  team_num    1672 non-null   int64 
dtypes: bool(1), int64(5), object(13)
memory usage: 2

In [9]:
df.isnull().sum()

game_id          0
attendance       0
team             0
tie              0
pk               0
stage            0
round            0
year             0
date             0
time             0
stadium          0
home             0
lat              0
long             0
referee          0
booked        1096
url              0
goals            0
team_num         0
dtype: int64

In [10]:
df.duplicated().sum()

0

In [11]:
type(df.year)

pandas.core.series.Series

In [12]:
df.describe()

Unnamed: 0,game_id,attendance,year,goals,team_num
count,1672.0,1672.0,1672.0,1672.0,1672.0
mean,419.16866,44651.401914,1984.535885,1.416268,1.5
std,241.788452,23560.649582,22.293186,1.415118,0.50015
min,1.0,2000.0,1930.0,0.0,1.0
25%,209.75,28966.75,1970.0,0.0,1.0
50%,419.5,41000.0,1990.0,1.0,1.5
75%,628.25,60407.0,2002.0,2.0,2.0
max,837.0,179854.0,2014.0,9.0,2.0


In [13]:
df.head()

Unnamed: 0,game_id,attendance,team,tie,pk,stage,round,year,date,time,stadium,home,lat,long,referee,booked,url,goals,team_num
0,1,25000,Italy,False,False,FIRST ROUND,False,1934,27-05-1934 (16:00 h),16:00,Stadio Nazionale del PNF (Roma),Italy,41.926953,12.472197,RenÃÂ© Mercet (SWI),,1934_ITALY_FS.htm,7,1
1,1,25000,USA,False,False,FIRST ROUND,False,1934,27-05-1934 (16:00 h),16:00,Stadio Nazionale del PNF (Roma),Italy,41.926953,12.472197,RenÃÂ© Mercet (SWI),,1934_ITALY_FS.htm,1,2
2,2,16000,Austria,False,False,FIRST ROUND,False,1934,27-05-1934 (16:30 h),16:30,Benito Mussolini (Torino),Italy,45.066251,7.691228,John van Moorsel (NED),,1934_ITALY_FS.htm,3,1
3,2,16000,France,False,False,FIRST ROUND,False,1934,27-05-1934 (16:30 h),16:30,Benito Mussolini (Torino),Italy,45.066251,7.691228,John van Moorsel (NED),,1934_ITALY_FS.htm,2,2
4,3,8000,Germany,False,False,FIRST ROUND,False,1934,27-05-1934 (16:30 h),16:30,Giovanni Berta (Firenze),Italy,44.275234,11.722625,Francesco Mattea (ITA),,1934_ITALY_FS.htm,5,1


In [18]:
row_df = df.loc[[1999, 2014]]


KeyError: ignored

In [14]:
p = figure()
p.circle(x = df.loc[2014].goals, y=df.loc[2014].attendance)
show(p)

KeyError: ignored