# Julia for Data Science
Based on work by [@nassarhuda](https://github.com/nassarhuda)!

In this tutorial, we will discuss why *Julia* is the tool you want to use for your data science applications.

We will cover the following:
* **Data**
* Data processing
* Visualization

### Data: Build a strong relationship with your data.
Every data science task has one main ingredient, the _data_! Most likely, you want to use your data to learn something new. But before the _new_ part, what about the data you already have? Let's make sure you can **read** it, **store** it, and **understand** it before you start using it.

Julia makes this step really easy with data structures and packages to process the data, as well as, existing functions that are readily usable on your data. 

The goal of this first part is get you acquainted with some Julia's tools to manage your data.

First, let's download a csv file from github that we can work with.

Note: `download` depends on external tools such as curl, wget or fetch. So you must have one of these.

We can use shell commands like `ls` in Julia by preceding them with a semicolon.

Add the CSV package to Julia using `add()`. `CSV.read()` will automatically  define headers from the .csv file if we set the `header` argument as `true`.
We could also use the `DelimitedFiles` package and its `readdlm()` function as shown below.

Here we write our first small function. <br>
Now you can answer questions such as, "when was language X created?"

As expected, this will not return what you want, but thankfully, string manipulation is really easy in Julia!

**Reading and writing to files is really easy in Julia.** <br>

You can use different delimiters with the function `readdlm` (`readcsv` is just an instance of `readdlm`) available with the `DelimitedFiles` package. <br>

To write to files, we can use `writecsv` or `writedlm`. <br>

Let's write this same data to a file with a different delimiter.

We can now check that this worked using a shell command to glance at the file,

and also check that we can use `readdlm` to read our new text file correctly.

### Dictionaries
Let's try to store the above data in a dictionary format!

First, let's initialize an empty dictionary

Here we told Julia that we want `dict` to only accept integers as keys and vectors of strings as values.

However, we could have initialized an empty dictionary without providing this information (depending on our application).

This dictionary takes keys and values of any type!

Now, let's populate the dictionary with years as keys and vectors that hold all the programming languages created in each year as their values.

Now you can pick whichever year you want and find what programming languages were invented in that year

### DataFrames! 
*Shout out to R fans!*
One other way to play around with data in Julia is to use a DataFrame.

This requires loading the `DataFrames` package

You can access columns by header name, or column index.

In this case, `df[1]` is equivalent to `df[:year]`.

Note that if we want to access columns by header name, we precede the header name with a colon! In Julia, this means that the header names are treated as *symbols*.

**`DataFrames` provides some handy features when dealing with data**

First, it uses the "missing" type.

Let's see what happens when we try to add a "missing" type to a number.

`DataFrames` provides the `describe` can give you quick statistics about each column in your dataframe 

### RDatasets

We can use RDatasets to play around with pre-existing datasets

Note that data loaded with `dataset` is stored as a DataFrame. 😃

The summary we get from `describe` on `iris` gives us a lot more information than the summary on `df`!

### Manage missing values

The handling of `missing` type has been completly reworked in 1.0 [see here for more details](https://docs.julialang.org/en/v1/manual/missing/#Arrays-With-Missing-Values-1).


Missing values ruin everything! 😑

Luckily we can ignore them with `skipmissing`!

In fact, `describe' will drop these values too

Note that `typeof(calories)` is `Array{Union{Missing, Int64},1}`


We can remove all missing values by e.g. 0

We can also `join` two dataframes together

### FileIO

Again, let's check that this download worked!

Next, let's load the Julia logo, stored as a .png file

We see below that Julia stores this logo as an array of colors.