# Week 5 - Lets get down to business!

Hello and welcome back to week 5 of our python for SPSS users workshops. 

This is the fist session where you will actually start to work with data in python, but before we start with that we just want to remind you that, even though there will be nuances to the `syntax` we use when when working with data, we have tried to introduce you to the fundamentals of each step. We'll do our best to point you to the previous resources or other online resources that you can use as you go. 

With that in mind, rather than recaping anything this week, we're going to jump into it, but the earlier workbooks and word documents are always there for you to go back to.

With that in mind... lets do this.

## Part 1 - set up and work flow. 

OK, the first part of any project is the set up. We can't stress enough how important it is to have a consistent workflow when working on a project, as anyone who has ever saved an assignment and then forgotten where they saved it will tell you. What we're going to quickly go through here is **an** example of how to do this, you might find other ways that suits you better, making small changes or laying things out differently, but these steps are the bare minimum, and while you can (and should) add things if they seem useful, there would need to be a very good reason to skip any of them.  

### Create your directory
A directory is another name for a 'folder' on your computer and every project you work on should have it's own directory where everything connected to that project is stored. We covered this in week one when we asked you to create a directory called something like 'Programming club'. This really should be the first step in any project, even if you're just playing around with some code. 

#### Create your subdirectories
A sub-directory is just a folder that exists within another folder, sometimes you'll see them called sub-folders, and we use them to help us keep track of a project, and to make sure that there is some reasonable order to the files that are required. In general it is good practice to have:

 1. A folder that contains all of your code - this is generally called 'src'
 2. A folder that contains all of your data, your raw data files (in a surprising twist) 'data'
 3. A folder that contains any output from your work, like text files, or graphs/figures - this is called 'output'

Can you add other folders? Of course. Can you put folders within folders (to seperate old files from new files for example) again, please do! These directories are there to help you, so set them up as you will, (this paper)[https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510] is a really good place to look for guidance on how to organise your project. We suggest you give it a read whenever, but honestly, untill you've got a couple of small projects under your belt, a lot of it won't really make sense to you. 

In week 1 of these workshops we got you to set up a directory and subdirectories for these workshops, we have been encouraging you to save all the note books for each week in the 'src' folder and any of the preparatory word docs in a 'preparation' folder, so you can see that we added to the three basic sub-folders already with the 'preparation' folder, so again, make it work for you. You can always add folders later in a project if they're needed, or you can move things into other sub-directories if you need to, but starting out with a structure in place is so valuable (trust us).

You don't need to do this again for these workshops, we already had you do this in week 1, but we wanted to make it explicite. Its an important first step. 

**In the future you will come across something called 'Git' or 'version control' and we encourage you to learn about it and something called 'github.com', it's really great, but we didn't want to overburden you with new stuff to early. If you're interested in it we can try to run a session on it, or you can check out the [git for poets](https://www.youtube.com/watch?v=BCQHnlnPusY&list=PLozRqGzj97d02YjR5JVqDwN2K0cAiT7VK) youtube tutorial is areally good start**


### Create your virtual environment

**We. Can't. Stress. Enough.** How important it is to any project in python, data projects or otherwise, to use `virtual environments` also called `venvs`. We covered this in the sessions in week 4 but just to reiterate the point here. Python and the modules that we use in our projects are always growing and changing, new functionality is always being added and existing functionality is always being improved. This means that over time, code that you have written might stop working. This doesn't happen often, and it never happens out-of-the-blue, but it does happen. Imagine going back to a project after the summer holidays, for example, or switching over to a new device after there has been a significant update to python, and needing to go back through all your code to find `depreciated` (outdated) `code` that no longer works and rewtire all of it to get your results again. Virtual environments help to avoid this. 

If you haven't set up a virtual environment yet, we sent out a document called 'Setting up a Virtual Environment with Anaconda.docx' that you should work through now, so that you have one for these workshops, but if you set one up last week, you can just continue to use that one. Don't be afraid to ask questions in the lab sessions (or drop us an email) if you need more help with it. 

#### Installing modules (also called libraries or packages)
Generally, when you are working on a project there will be `modules` that you know you'll need before you start and you want to make sure that those modules are installed when you create the `virtual environment`. For example, when working on an SPSS-like data project you'll need `pandas`, `matplotlib` and `scipy.stats` (theres others but these are the ones we reach for most often), and, as we outlined in the'Setting up a Virtual Environment with Anaconda.docx' guide, we would just install them with pip:

```
pip install pandas 

```

You can always go back and install more later by reactivating your `venv` and just running another `pip` call, and this will happen fairly often when you start out, but it's a good idea to just install the ones you know you'll need at the outset. 

### Put the files you need where you need them. 
So now that you have your directory set up and you've made your `venv` you just need to bring any files that you know you'll need together to be ready to start writing code. So for example, we emailed you a some datafiles called `raw_data_spss.sav` and `raw_data_csv.csv`, so you should put those files in the 'data' sub-directory, if you had other files that needed to be there to start the project, like maybe a codebook file, or an outline text document, you might put them in an appropriate sub-directory too. 


And with that you're basically ready to make your first python file. In this case you'll be working in this notebook, but in future you will open VScode (or another IDE if you find one you like) and get to work. 


## Part 2 - get on with it lads!

Ok ok. You're clearly ready to start working, so lets get to it. 

### Importing modules

Just like setting up your folder structure and `venv` is the start of any project, the start of any python file (whether it's a jupyter notebook or just a normal python file), is to `import` the modules you'll be using. We do this in the first cell of our notebook for a few reasons. 

 1. 

