# Content with notebooks

You can also create content with Jupyter Notebooks. This means that you can include
code blocks and their outputs in your book.

## Markdown + notebooks

As it is markdown, you can embed images, HTML, etc into your posts!

![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)

You can also $add_{math}$ and

$$
math^{blocks}
$$

or

$$
\begin{aligned}
\mbox{mean} la_{tex} \\ \\
math blocks
\end{aligned}
$$

But make sure you \$Escape \$your \$dollar signs \$you want to keep!

## MyST markdown

MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check
out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),
or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).

## Code blocks and outputs

Jupyter Book will also embed your code blocks and output in your book.
For example, here's some sample Matplotlib code:

In [None]:
from matplotlib import rcParams, cycler
import matplotlib.pyplot as plt
import numpy as np
plt.ion()

Tugas 
Mengukur jarak (dissimilarity)
1. Ambil data dari Kaggle atau GitHub
2. Ukur jarak 
d(1,2), d(1,3), d(1,4) dari data tersebut

In [None]:

import pandas as pd
import numpy as np

In [50]:

# Create dataset from Google Drive
dataset_url = "https://raw.githubusercontent.com/prasertcbs/basic-dataset/master/Employee%20data.csv"
df = pd.read_csv(dataset_url)

In [43]:
df

Unnamed: 0,id,gender,bdate,educ,...,salbegin,jobtime,prevexp,minority
0,1.0,Male,1952-02-03,15,...,27000.0,98.0,144.0,No
1,2.0,Male,1958-05-23,16,...,18750.0,98.0,36.0,No
2,3.0,Female,1929-07-26,12,...,12000.0,98.0,381.0,No
3,4.0,Female,1947-04-15,8,...,13200.0,98.0,190.0,No
...,...,...,...,...,...,...,...,...,...
470,471.0,Male,1966-08-03,15,...,15750.0,64.0,32.0,Yes
471,472.0,Male,1966-02-21,15,...,15750.0,63.0,46.0,No
472,473.0,Female,1937-11-25,12,...,12750.0,63.0,139.0,No
473,474.0,Female,1968-11-05,12,...,14250.0,63.0,9.0,No


In [51]:

# Show dataset shape
number_of_columns = df.shape[1]

In [52]:

# Show all columns for dataset
pd.set_option('display.max_columns', number_of_columns)
pd.set_option('display.max_rows', number_of_columns)

In [53]:

# Show all columns from dataframe
df.columns

Index(['id', 'gender', 'bdate', 'educ', 'jobcat', 'salary', 'salbegin',
       'jobtime', 'prevexp', 'minority'],
      dtype='object')

In [54]:
df[["id","jobcat", "minority"]].head(5)

Unnamed: 0,id,jobcat,minority
0,1.0,Manager,No
1,2.0,Clerical,No
2,3.0,Clerical,No
3,4.0,Clerical,No
4,5.0,Clerical,No


In [59]:

# jobcat code
code_jobcat_for_manager = "Manager"
code_jobcat_for_clerical = "Clerical"

# minority code
code_minority_for_yes = "Yes"
code_minority_for_no = "No"

# binary value
value_of_one = 1
value_of_zero = 0

def change_code_jobcat_to_biner(jobcat):
    return value_of_one if jobcat == code_jobcat_for_manager else value_of_zero
def change_code_minority_to_biner(minority):
    return value_of_one if minority == code_minority_for_yes else value_of_zero

In [57]:

# Update all values of 'jobcat' series
df["jobcat"] = df["jobcat"].apply(change_code_jobcat_to_biner)

In [60]:
# Update all values of 'minority' series
df["minority"] = df["minority"].apply(change_code_minority_to_biner)

In [61]:
df[["id","jobcat", "minority"]].head(5)

Unnamed: 0,id,jobcat,minority
0,1.0,1,0
1,2.0,0,0
2,3.0,0,0
3,4.0,0,0
4,5.0,0,0


In [62]:

# CONSTAN VARIABLE
DECREMENT_BY_ONE = 1
INCREMENT_BY_ONE = 1

CONTINGENCY_TABLE_VALUE = {
    "q" : (1,1),
    "r" : (1,0),
    "s" : (0,1),
    "t" : (0,0),
}

In [63]:

def get_series(df, idx, series):
    return df.loc[(idx), series]

In [64]:

def get_dissimilarity_dataset(df, series_index = [], series = []):
    first_series = get_series(df, series_index[0], series)
    second_series = get_series(df, series_index[1], series)
    dataset = pd.concat([first_series,second_series],axis=1)
    return dataset.T

In [65]:
get_dissimilarity_dataset(df, [1,2], ["jobcat", "minority"]).T

Unnamed: 0,1,2
jobcat,0,0
minority,0,0


In [66]:
df.loc[0:5, ["jobcat", "minority"]]

Unnamed: 0,jobcat,minority
0,1,0
1,0,0
2,0,0
3,0,0
4,0,0
5,0,0


In [75]:

def count_contingency_value(df, start_index = 0, last_index = 1):

    CONTINGENCY_VALUE = {
        "q" : 0,
        "r" : 0,
        "s" : 0,
        "t" : 0,
    }

    column_range = df.shape[1]

    for column in range(column_range):
        for value in CONTINGENCY_TABLE_VALUE:
            item = list((tuple(df.loc[(start_index):(last_index), df.columns[column]]) == CONTINGENCY_TABLE_VALUE[value], value))
            if item[0] == True:
                if item[1] == "q":
                    CONTINGENCY_VALUE["q"] += 1
                if item[1] == "r":
                    CONTINGENCY_VALUE["r"] += 1
                if item[1] == "s":
                    CONTINGENCY_VALUE["s"] += 1
                if item[1] == "t":
                    CONTINGENCY_VALUE["t"] += 1

    return CONTINGENCY_VALUE

In [68]:

# d(1,2)
df_1_2 = get_dissimilarity_dataset(df, [1,2], ["jobcat", "minority"])

In [71]:
c_d_1_2 = count_contingency_value(df_1_2, 1, 2)

In [70]:

# d(1,3)
df_1_3 = get_dissimilarity_dataset(df, [1,3], ["jobcat", "minority"])

In [72]:
c_d_1_3 = count_contingency_value(df_1_3, 1, 3)

In [73]:

# d(1,4)
df_1_4 = get_dissimilarity_dataset(df, [1,4], ["jobcat", "minority"])

In [76]:
c_df_1_4 = count_contingency_value(df_1_4, 1, 4)

In [83]:

def measure_dissimilarity_binary_value_assymetric_distance(contingency_value):

    return (contingency_value["r"] + contingency_value["s"]) / (contingency_value["q"] + contingency_value["r"] + contingency_value["s"])

In [84]:

d_1_2 = measure_dissimilarity_binary_value_assymetric_distance(c_d_1_2)
d_1_3 = measure_dissimilarity_binary_value_assymetric_distance(c_d_1_2)
d_1_4 = measure_dissimilarity_binary_value_assymetric_distance(c_d_1_2)

ZeroDivisionError: ignored

In [80]:
d_1_2

1.0

In [81]:
d_1_3

1.0

In [82]:
d_1_4

1.0

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Fixing random state for reproducibility
np.random.seed(19680801)

N = 10
data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]
data = np.array(data).T
cmap = plt.cm.coolwarm
rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))


from matplotlib.lines import Line2D
custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),
                Line2D([0], [0], color=cmap(.5), lw=4),
                Line2D([0], [0], color=cmap(1.), lw=4)]

fig, ax = plt.subplots(figsize=(10, 5))
lines = ax.plot(data)
ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);

There is a lot more that you can do with outputs (such as including interactive outputs)
with your book. For more information about this, see [the Jupyter Book documentation](https://jupyterbook.org)