# What is Python, Jupyter and Anaconda?

Python, Jupyter and Anaconda are three distinctly different applications that serve different purposes.

* Python is the software that performs the actual instructions. Without Python installed on your computer or server, you would not be able to run any commands.

* Jupyter, either using Jupyter Notebooks or JupyterLab, is the software that provides an interface to Python. It’s sometimes referred to as an Integrated Development Environment (IDE). Its purpose is to provide bells and whistles that can make Python easy to use and access. Other IDEs are also available for Python, though Jupyter is the most popular.

* Anaconda is a distribution manager, which one can use to access Jupyter and other IDEs, storing everything within one place.

# Where do I download Python?

The easiest way to access and work with Python is through Anaconda. 

If you need Anaconda on your work laptop, you will need to raise a service request. 

To download on a personal latop, the following links can be used: https://www.anaconda.com/.

# Getting started

To launch Anaconda, use the 'Anaconda Navigator'. After launching, you should a screen similar to the below.

<div>
<img src="Images/Anaconda.png" width="500"/>
</div>

Now click on 'launch' for JupyterLab to load up an instance of Jupyter. Once open, you should be greeted with something like the below.

<div>
<img src="Images/Jupyter.png" width="750"/>
</div>

## Opening and saving a new script

Now on the left hand side, simply navigate to the folder you would like to create a Python file in and then click on the Python3 button under 'Notebook' in the middle. Once done, you should get a screen like so.

<div>
<img src="Images/Notebook.png" width="500"/>
</div>

Note that all files save automatically. You can also save by clicking the flopping disk on the top bar. 

Here we are presented with a empty box in the middle known as a 'code chunk', where we write our code. 

For example if you type 2+2 in the box and run it, by either pressing the play button or using Ctrl+Enter, we will get 4. When ran, a new chunk appears automatically. You can click the '+' button to create additional boxes.

Python has many in built functions you can use to perform specific tasks but additional packages are available that can make these tasks easier to perform or allow additional functionality compared to the base functions. 

In the script you can write notes to go with your code it you add # in front of any text it will make it as a note. 

# Packages

## What are packages? 
Packages are collections of Python functions, data, and compiled code in a well-defined format, created to add specific functionality. There are 10,000+ user contributed packages and growing.

## How to install a package
Packages for Python can be installed using a program called Pip.  

* First, navigate back to the launcher by clicking the blue '+' in the top left, just above your files.
* Then, open the 'terminal'. 
* Finally, type `pip install package-name`, swapping "package name" with the name of the package you want to install.

The mostly commonly used package for data manipulation is `pandas`. Lets install pandas using `pip install pandas` within the terminal.

## Using a package
Once installed, you need to use the `import` function to load the package you want to use on your script. For this demo, we will be using the following packages:


In [1]:
import pandas as pd

## Getting help with a package

Within a package there are functions which are “self contained” modules of code that accomplish a specific task. Functions usually take in some sort of data structure (value, vector, dataframe etc.), process it, and return a result. 

For each package, there is documentation produced which provides information on the different functions it contains and gives examples of how to use the function. These can be accessed by simply googling the documentation for the package. For example, here is the documentation for `pandas`: https://pandas.pydata.org/docs/user_guide/index.html#user-guide.


# Working directory
The working directory is the file path for the directory you are working in on your computer. This is the location Python will look in for any files you are trying to read in. It is also where Python will store any output you save by default.  Note: you can specify the file path when loading/saving if the working directory is not what you want/need.

## Set working directory

You can set the working directory to a specific folder on your computer, so for example a project folder where you may have data saved that you would like to upload in to Python. **Following the above example, you have already done this, by navigating to the desired folder using the bar on the left hand side.**

# Loading Data into R

You can load many data formats in to R, this session will cover reading in CSV and Excel files as these are the two most common file types you might want to load. 

## Load a CSV
R has an inbuilt function to read in csv files (see code below).If you have set your working directory to the folder your data is saved in then you can use the following code and just add the name of the CSV file in the brackets. When you load the data in you need to give the dataframe you are creating in R a name So if for example you wanted to import a csv file called MyData and call the table first_table then you would use the following code. In the code you put the name you want to call the table in R first followed by <-

```{read csv from WD}
first_table <- read_csv("Mydata.csv")
```

If the data you want to load is in a different location to your working directory then you would need to add the file path to where the data is saved to load the data in. For example you wanted to load in a CSV called Mydata2 and it was saved in a folder called CSVS in your person folder and you wanted to call it second_table then you would use the following code. Both the file path and the name of the CSV need to be included in the "". REMEMBER that the slashes in the file path should be forwards and not backwards. 

```{read CSV from different location}
second_table <- read_csv("C:/Users/cypher/CSVS/Mydata2.csv")
```

Once your data has been loaded you will see it in the environment window in the top right of your screen. This tells you how many rows the table has (obs short for observations) and how many columns (variables).If you click on the name of the table you can then view the table.

![](C:/Users/nidod/Documents/GitHub/coffee-and-coding/2022-08-17 Using R for the first time/Images/first_table.png)

## Load an excel file

To load an excel file you can use a package called readXl (there are alternative packages for this also available). If you have installed tidyverse, this is one of the packages included. As above if you have set your working directory to the folder your excel spreadsheet is saved in then you just need to include the name of the excel file in the brackets. For example if you want to import Mydata3 and call it third_table then you would use the following code

```{read excel from WD}
thrid_table <- read_excel("Mydata3.xlsx")
```

Again as above if the excel spreadsheet is saved in a different location to your working directory then you would need to add the file path to where the data is saved to load the data in. For example you wanted to load in an excel spreadsheet called Mydata4 and it was saved in a folder called excel in your person folder and you wanted to call it fourth_table then you would use the following code. 

```{read excel from different location}
fourth_table <- read_excel("C:/Users/Cypher/EXCEL/Mydata4.xlsx")
```

# Simple anaylsis in R 

For this section we are going to use one of the data sets pre-built in R called mtcars but any of the following analysis could be done on data sets you have loaded yourself. To load the data we use the following code. The functions used in the section are from the dyplr package that we installed via the tidyverse package

```{r create data set}
mtcars <- mtcars
```

## Sum up a column

If you want to get a sum of a column you can use the summaries function in the dplyr.The following code will give you a data output item with the sum of all of the cylinders of the 32 cars in the mtcar data set. In this example we are creating a data object called num_cyl to keep a record of the value. We also need to use a pipe to then use the summaries function, the pipe is %>% (or you might also see it written as  !>). The key board short cut to create the pipe symbol is Ctrl+Shift+M.

```{r sum total number of cylinders}
num_cyl <- mtcars %>% 
  summarise(sum(cyl))
```

## Filter rows

If you want to filter rows from the table, for example you only want to look at cars with 4 cylinders the following code can be used. A double = is used for an exact match. 

```{r only cars with 4 cylinders}
Cylinders4 <- mtcars %>% 
filter(cyl == 4)
```

If you want to filter on more than one column, for example it we want cars with 4 cylinders that get more than 30 mpg then the following code can be used.

```{r cylinders and mpg}

Cylinders4_mpgn <- mtcars %>% 
filter(cyl == 4 & mpg >30)

```

## Grouping data

If you want to count the number of cars by the number of cylinders they have then you can use the following code. In the group_by you need to add the column name that you want to group by so in this example the number of cylinders (cyl), then we used the summarise function to count the number of cars,in the brackets you add the name you want the column to be and the = n() is used to count the numbers of rows that has each number of cylinders.  

```{r group by}
Num_cars = mtcars %>% 
  group_by(cyl) %>% 
  summarise(number_of_cars = n())
 


```

# Drawing a graph

We will use the ggplot2 package to draw a bar chart to show the number of cars by the number of cylinders they have. The geom_bar is used to draw a bar chart, if you wanted to draw a line graph instead this would be set to geom_line. This session just covers the very basics of drawing a graph, at the end of the document there is a link to a cheat sheet which contains example code and details of other options and parameters you can change and include in graphs.  

```{r ggplot graph}
ggplot(data=Num_cars, aes(x= cyl, y= number_of_cars)) +
  geom_bar(stat="identity")
```

We can see the scale for the cylinders is continuous and not specific to the 3 cylinder sizes we have in our data. This happens when you want to plot a number on the x axis, to correct these we need to change the data type of this data field to a factor, so that it is recognizes it as an individual number. This can be done using the mutate function.

```{r}
Num_cars <- Num_cars %>% 
 mutate(cyl= as.factor(cyl))
```

If you then replot the graph the scale of the x axis is now correct.

```{r}
ggplot(data=Num_cars, aes(x= cyl, y= number_of_cars)) +
  geom_bar(stat="identity")
```

## Change the colour of the bars

If you want to change the colour of your bars you can specify the colour this example is for steelblue. Buy using the fill function you can manually select the colour. You can view the colour palettes available for ggplot2 here. https://r-graph-gallery.com/ggplot2-color.html

```{r}


ggplot(data=Num_cars, aes(x= cyl, y= number_of_cars)) +
  geom_bar(stat="identity", fill="steelblue")
```

If you want your bars to be different colours you can use the fill option in the code and it will use a default set. There are also different colour palettes available and you can create your own colour themes.   

```{r}
ggplot(data=Num_cars, aes(x= cyl, y= number_of_cars, fill = cyl)) +
  geom_bar(stat="identity")
```

## Adding a title to the Graph

You can give your graph a title by using ggtitle. You can change the location and size of the title but we won't be covering that in this session. 

```{r}
ggplot(data=Num_cars, aes(x= cyl, y= number_of_cars, fill = cyl)) +
  geom_bar(stat="identity")+
ggtitle("Number of cars by number of Cylinders")
```

## Add axis labels

You can also change the axis labels using xlab and ylab 

```{r}
ggplot(data=Num_cars, aes(x= cyl, y= number_of_cars, fill = cyl)) +
  geom_bar(stat="identity")+
ggtitle("Number of cars by number of Cylinders") +
  xlab("Number of Cylinders") +
  ylab("Number of Cars")
```

# Useful resources

https://www.rstudio.com/resources/cheatsheets/

https://stackoverflow.com/

https://www.datacamp.com/courses/free-introduction-to-r

