# Companion Notebook for _Beginning Data Science with Jupyter Notebook and Kotlin_

This Jupyter Notebook contains all the code examples featured in the [RayWenderlich.com](https://www.raywenderlich.com/) article, [_Beginning Data Science with Jupyter Notebook and Kotlin_](https://www.raywenderlich.com/27470499-beginning-data-science-with-jupyter-notebook-and-kotlin). The article is a follow-up to [_Create Your Own Kotlin Playground (and Get a Data Science Head Start) with Jupyter Notebook_](https://www.raywenderlich.com/27470305-create-your-own-kotlin-playground-and-get-a-data-science-head-start-with-jupyter-notebook).


## Requirements

In order to use this notebook, you’ll need the following installed on your system, in this order:

1. Python 3.8 or later
2. Jupyter Notebook
3. The Kotlin kernel for Jupyter Notebook

The [Getting Started](https://www.raywenderlich.com/27470305-create-your-own-kotlin-playground-and-get-a-data-science-head-start-with-jupyter-notebook#toc-anchor-003) section of the prior article, [_Create Your Own Kotlin Playground (and Get a Data Science Head Start) with Jupyter Notebook_](https://www.raywenderlich.com/27470305-create-your-own-kotlin-playground-and-get-a-data-science-head-start-with-jupyter-notebook), shows you a way to get the above requirements quickly. In summary, the process is:

1. Install Python and Jupyter Notebook by installing the [Invidividual Edition](https://www.anaconda.com/products/individual) of Anaconda Python.
2. Install the Kotlin kernel by entering the following on the command line (this works for Linux, macOS, and Windows): 

```
conda install -c jetbrains kotlin-jupyter-kernel
```

## Importing _krangl_

The cell below imports the _krangl_ library. _krangl_ takes its name from “***K***otlin library for data [w***rangl***ing”](https://en.wikipedia.org/wiki/Wrangler_(profession)). ts design borrows heavily from two R libraries: deplyr and purrr. You’ll find krangl provides a subset of classes, methods and properties with the same or similar names as those you’ll find in these libraries. This will come in handy because there’s far more documentation and literature on those libraries than for krangl — at least for now. 

In [None]:
%use krangl

## _krangl’s_ Built-In Data Frames

A major obstacle faced by people who are just getting started with data science is finding good datasets — collections of data — to work with. To counter this problem, krangl provides three datasets in the form of three built-in DataFrame instances.

Going from smallest to largest, these built-in data frames are:

* **`sleepData`**: Data about the sleep patterns of 83 species of animal, including humans.
* **`irisData:`**: A set of measurements of 150 flowers from different species of iris.
* **`flightsData:`**: Over 330,000 rows of data about flights taking off from New York City airports in 2013.

Most of the exercises in this notebook will use `sleepData`.

## Basic Data Operations

### Getting a Data Frame’s First and Last Rows

In [None]:
sleepData

In [None]:
sleepData.head()

In [None]:
val lastFew = sleepData.tail()
lastFew

### Extracting a `slice()` from the Data Frame

In [None]:
val selection = sleepData.slice(30..34)
selection

### `slice()` Indexes Start at 1, not 0

In [None]:
sleepData.rows.elementAt(0)

In [None]:
sleepData.slice(0..0)

In [None]:
sleepData.slice(1..1)

In [None]:
sleepData.slice(1..2)

## Sorting

In [None]:
sleepData.sortedBy("sleep_total")

In [None]:
sleepData.sortedBy("sleep_total").slice(10..15)

In [None]:
sleepData.sortedBy("sleep_total").slice(16..20)

In [None]:
sleepData.sortedBy("sleep_total", "bodywt")

In [None]:
sleepData.sortedByDescending("sleep_total")

## Filtering

In [None]:
val herbivores = sleepData.filter { it["vore"] isEqualTo "herbi" }
println("This dataset has ${herbivores.nrow} herbivores.")
herbivores

In [None]:
val heavyHerbivores = herbivores.filter { it["bodywt"] ge 200 }
heavyHerbivores

### Filtering With Negation

In [None]:
val nonHerbivores = sleepData.filter { (it["vore"] eq "herbi").not() }
nonHerbivores

### Filtering With Multiple Criteria

In [None]:
val alsoHeavyHerbivores = sleepData
    .filter { 
        (it["vore"] eq "herbi") AND 
        (it["bodywt"] gt 30)
    }
alsoHeavyHerbivores

### Fancier Text Filtering

In [None]:
val monkeys = sleepData.filter { it["name"].isMatching<String> { contains("monkey") } } 
monkeys

## Removing Columns

In [None]:
val simplifiedSleepData = sleepData.select("name", "vore", "sleep_total", "sleep_rem")
simplifiedSleepData

In [None]:
val evenSimplerSleepData = simplifiedSleepData.remove("sleep_rem")
evenSimplerSleepData

## More Complex Data Operations

### Calculating Column Statistics

In [None]:
val sleepCol = sleepData["sleep_total"]
println("The mean sleep period is ${sleepCol.mean(removeNA=true)} hours.")
println("The median sleep period is ${sleepCol.median(removeNA=true)} hours.")
println("The standard deviation for the sleep periods is ${sleepCol.sd(removeNA=true)} hours.")
println("The shortest sleep period is ${sleepCol.min(removeNA=true)} hours.")
println("The longest sleep period is ${sleepCol.max(removeNA=true)} hours.")

### Grouping

In [None]:
val groupedData = sleepData.groupBy("vore")

In [None]:
groupedData

In [None]:
groupedData.groups()

In [None]:
groupedData.count()

In [None]:
val sortedGroupedData = groupedData.sortedBy("name")
sortedGroupedData.groups()

## Summarizing

In [None]:
groupedData
    .summarize(
        "Mean daily total sleep (hours)" to { it["sleep_total"].mean(removeNA=true) },
        "Mean daily REM sleep (hours)" to { it["sleep_rem"].mean(removeNA=true) }
    )

In [None]:
groupedData
    .summarize(
        "Mean daily total sleep (hours)" to { it["sleep_total"].mean(removeNA=true) },
        "Mean daily REM sleep (hours)" to { it["sleep_rem"].mean(removeNA=true) }
    )
    .sortedBy("Mean daily total sleep (hours)")

## Importing Data

### Reading .csv Data

In [None]:
val ramenRatings = DataFrame.readCSV("https://koenig-media.raywenderlich.com/uploads/2021/07/ramen-ratings.csv")
ramenRatings

### Reading .tsv Data

In [None]:
val restaurantReviews = DataFrame.readTSV("https://koenig-media.raywenderlich.com/uploads/2021/07/restaurant-reviews.tsv")
restaurantReviews