<div align="right">Python 2.7 Jupyter Notebook</div>

# Refreshing your virtual analysis environment and additional resources

### Introduction

We will release content at the start of each module and you will need to visit this notebook before starting a new module.
In the first step you will update the notebook itself. After refreshing the page (F5) additional instructions will be visible in the remainder of the Notebook.

Please remember that we will **force an update** and that any changes you make to this notebook will be lost when you refresh the notebook. Should you make any changes that you want to keep, please save the notebook under a different file name. Please make sure that you download copies of all the notebooks that you have changed and need to keep to your local device.

## 1. Update this notebook
### 1.1 Update this notebook (M0_NB2_AdditionalResources)

<br>
<div class="alert alert-warning">
<b>Execute the cell below.</b>
</div>

In [None]:
## Forced update of M0_NB2_AdditionalResources.ipynb
!svn export https://github.com/getsmarter/bda/trunk/module_0/M0_NB2_AdditionalResources.ipynb /home/ubuntu/projects/module_0/ --force

## Update images required for M0_NB2 (Module 3)
!svn export https://github.com/getsmarter/bda/trunk/module_0/NotebooksRunningAll.png /home/ubuntu/projects/module_0/
!svn export https://github.com/getsmarter/bda/trunk/module_0/NotebooksRunningNone.png /home/ubuntu/projects/module_0/
!svn export https://github.com/getsmarter/bda/trunk/module_0/NotebooksRunningOne.png /home/ubuntu/projects/module_0/

You should see:
> `A    M0_NB2_AdditionalResources.ipynb`

> `Export complete.`

### 1.2 Refresh this page in your browser to reflect changes in the notebook. 
Press F5 or manually refresh this page. You should see a cell with the latest module available below.

Status: <font color='green'><b>Module 3 content added - Version 1.00.</b></font>

<br>
<div class="alert alert-warning">
<b>Note to all users.</b>
This week you will process higher volumes of data on your analysis environment and you will need to shut down any notebooks that you are note actively using to ensure that the maximum amount of resources is available to iterate through the dataset. In many cases you will have multiple notebooks running at the same time, but it is good practice to shut down notebooks that are not in use. You typically pay for the amount of cloud resources that you utilize and shutting down items not used typically means incurring less expenses while offering improved user experience due to the maximum amount of resources being available for the task at hand.
</div>

> **Shutting down running notebooks**:

> You can shut down notebooks in a number of ways. The is the first is to select "File" and then "Close and Halt" from within the running notebook. The second, which we recommend using during this module, is to navigate to your directory view by first selecting the Jupyter logo at the top left of your screen or switching to a browser window where you already have this open. Select the "Running" tab at the top of the screen and shut down all notebooks that you do not require by selecting the "Shutdown" option on the right of the screen. When completing notebook 3 of this module, we strongly recommend that you shut down all other notebooks.

> ![Running Notebooks](NotebooksRunningAll.png "Image of notebooks currently running.")

You can try this now and shut down all notebooks apart from the current notebook, M0_NB2_AdditionalResources. After shutting down other notebooks, your screen should look similar to the screenshot below.

> ![Running Notebooks](NotebooksRunningOne.png "Image of only the current notebook running.")

When you have completed your environment refresh or prior to starting a new notebook, you can revisit the "Running" tab on your directory view and shut down all notebooks.

> ![Running Notebooks](NotebooksRunningNone.png "Image of no running notebooks.")


> **NOTE to non-technical users**:

> We utilise this mechanism to avoid additional effort on your behalf to update and refresh your environment. The process has been designed to ensure that you can select and run all the cells below and you do not need to understand the detail of the steps completed. Optional items will be marked clearly.

> When executing the commands, you only need to review the status messages that appear below the cells that you execute. We do not want to overwrite changes that you have made to files and apart from this notebook, our process will not overwrite any files that already exist in the various directories. Should you receive an error message that the file already exist, you will need to manually remove or rename existing files before coming back to this notebook and resuming the process from the point where you received the error.

## 2. Additional resources - Module 3
**Please read the instructions that we provide carefully and proceed to execute all cells not specifically marked as optional.**

This week we will:
- Install additional libraries.
- Load data sources and additional files required.
- Load the notebooks required.

Should you receive error messages which you are not able to resolve with the guidelines below, please reach out to technical support to assist you.

### 2.1 Install additional libraries
> **Info**:

> You can install additional libraries in your command line or directly from a notebook.
This section is used to update your virtual analysis environment by installing the libraries that you will require to complete the exercises.
It is possible to install additional libraries, but we strongly encourage students not to install additional libraries while the course is in progress to ensure that your environment remains stable.

#### 2.1.1 tqdm

<br>
<div class="alert alert-warning">
<b>Execute the cell below.</b>
</div>

In [None]:
!pip install --target=/home/ubuntu/.local/lib/python2.7/site-packages tqdm

You should see a message that ends with "Successfully installed tqdm".

That's it for libraries. Lets move to populating data sets and other supporting files.

### 2.2 Data Sources and supporting files for Module 3
You should execute the cell below and will see a line containing the status for each statement.
This action should be performed once. Subsequent attempts to execute the cell will fail as the files will already exist. Should you wish to refresh the files, you will need to manually remove the files from the "module_3" directory before executing the cell again.

<br>
<div class="alert alert-warning">
<b>Execute the cell below to update images required for Module 3.</b>
</div>

In [None]:
# No additional data sources required.
    
# Images rendered in notebooks in module 3.
!svn export https://github.com/getsmarter/bda/trunk/module_3/Location.png /home/ubuntu/projects/module_3/
!svn export https://github.com/getsmarter/bda/trunk/module_3/dtu_map.png /home/ubuntu/projects/module_3/
!svn export https://github.com/getsmarter/bda/trunk/module_3/updated_file.png /home/ubuntu/projects/module_3/

### 2.3 Notebooks for Module 3
> **Info**:

> When executing the cell below you will see a status message for each of the lines.
The status will either be that "the destination file exists and will not be overridden unless forced" or "export complete". The first message means that the notebook has not been refreshed and you will need to go to the relevant directory and change the filename or remove the file before retrying to refresh. The second means that a copy of the file has been uploaded to your virtual analysis environment.

Your directory will not contain any notebooks when you execute this command the first time and it should complete with three messages indicating "export complete".

<br>
<div class="alert alert-warning">
<b>Execute the cell below to update notebooks for Module 3.</b>
</div>

In [None]:
# Add Module 3 Notebooks 1-3 to your virtual analysis environment
!svn export https://github.com/getsmarter/bda/trunk/module_3/M3_NB1_BSSID.ipynb /home/ubuntu/projects/module_3/
!svn export https://github.com/getsmarter/bda/trunk/module_3/M3_NB2_NoiseVsBias.ipynb /home/ubuntu/projects/module_3/
!svn export https://github.com/getsmarter/bda/trunk/module_3/M3_NB3_GeotaggingWiFiAP.ipynb /home/ubuntu/projects/module_3/

### 2.4 Refresh specific notebooks [OPTIONAL] 
In cases where you have made changes to the notebook and wish to return the notebook to its original state, you will have to ensure that there are no files with the original names in the relevant directory.

> **Info**:

> You only need to look at this section if you wish to refresh specific notebooks to their original state.

You can rename or remove the existing files in the directory and then uncomment the relevant line or lines below to refresh your environment.

> ** Note**: 

> Statements in code blocks that start with # are viewed as comments and is not executed when running the cell.

In [None]:
# Uncomment the line below to refresh M3_NB1_BSSID.ipynb
#!svn export https://github.com/getsmarter/bda/trunk/module_3/M3_NB1_BSSID.ipynb /home/ubuntu/projects/module_3/

# Uncomment the line below to refresh M3_NB2_NoiseVsBias.ipynb
#!svn export https://github.com/getsmarter/bda/trunk/module_3/M3_NB2_NoiseVsBias.ipynb /home/ubuntu/projects/module_3/

# Uncomment the line below to refresh M3_NB3_GeotaggingWiFiAP.ipynb
#!svn export https://github.com/getsmarter/bda/trunk/module_3/M3_NB3_GeotaggingWiFiAP.ipynb /home/ubuntu/projects/module_3/

## 3 Refresh content from previous modules
Use this section with care and make sure that you read the instructions carefully.
You can update libraries and refresh data sets without risk, but forcing updates to the notebooks will overwrite any changes you made to the files if you have not changed the file names.

### 3.1 Update libraries and data sources from previous modules [OPTIONAL]
> **Info**:

> You should not need to revisit this section. In cases where you need to reinstall specific libraries or where you did not execute the statements during previous modules, you will have to uncomment the update statements and execute the cell below.

In [None]:
# M1 libraries
#!pip install --target=/home/ubuntu/.local/lib/python2.7/site-packages folium
#!pip install --target=/home/ubuntu/.local/lib/python2.7/site-packages geocoder

# M2 libraries.
#!pip install --target=/home/ubuntu/.local/lib/python2.7/site-packages pandas-datareader
#!pip install --target=/home/ubuntu/.local/lib/python2.7/site-packages wikipedia
#!pip install --target=/home/ubuntu/.local/lib/python2.7/site-packages bandicoot

# Refresh the data directory to contain the files you require for the exercises using the force option to ensure
#    that you have a clean dataset to start your exercises. Note that this command can take a while to complete.
#!svn export https://github.com/getsmarter/bda/trunk/data/ /home/ubuntu/projects/data/ --force

### 3.2 Refresh previous notebooks [OPTIONAL]
The instructions for uncommenting statements and the process to follow is the same as in section 2.4 of this notebook. Those interested in testing the process can refresh the "Getting Started" notebook in the Orientation module.

> **Note**:
> The statements below will fail if the file already exist and you will need to rename or remove files with the same names in the relevant directories. This is done in order to avoid overwriting changes that you have applied to the various notebooks.

In [None]:
# Orientation module
#!svn export https://github.com/getsmarter/bda/trunk/module_0/M0_NB1_GettingStarted.ipynb /home/ubuntu/projects/module_0/

# Module 1
#!svn export https://github.com/getsmarter/bda/trunk/module_1/M1_NB1_PythonIntro.ipynb /home/ubuntu/projects/module_1/
#!svn export https://github.com/getsmarter/bda/trunk/module_1/M1_NB2_DataAnalysisBasics.ipynb /home/ubuntu/projects/module_1/
#!svn export https://github.com/getsmarter/bda/trunk/module_1/M1_NB3_DataConsiderations.ipynb /home/ubuntu/projects/module_1/

# Module 2
#!svn export https://github.com/getsmarter/bda/trunk/module_2/M2_NB1_SourcesOfData.ipynb /home/ubuntu/projects/module_2/
#!svn export https://github.com/getsmarter/bda/trunk/module_2/M2_NB2_FunFIntroduction.ipynb /home/ubuntu/projects/module_2/
#!svn export https://github.com/getsmarter/bda/trunk/module_2/M2_NB3_CollectYourOwnData.ipynb /home/ubuntu/projects/module_2/

## 4. Additional links or resources

#### Message from your Tutor Group

Welcome to the third module of MIT Big Data and Social Analytics.

After working through video material and exercises on first-order analysis and data exploration, you will work on your practical assessments.

Modules 1 and 2 contained a large number of examples to help you to understand the code samples provided and allow you to explore additional options should you wish to do so. In this module we start applying what you have mastered in previous modules. We explore BSSID's, then move to a example of noise vs bias, and lastly, we apply some of the concepts introduced in transforming the full dataset to geotag WiFi access points. While some of the steps have been simplified, we align closely to the steps performed in the referenced papers to provide you with a view of the work required to execute on similar projects.

**Notebook 1: Test your intuition about BSSIDs**
This notebook introduces BSSIDs in more detail and revisits some of the concepts around testing on yourself. We repeat some of the data exploration steps for a single user and offer you the chance to think about what you would expect to see in the data. 

**Notebook 2: Exploring Noise vs Bias**
This notebook will utilize a generated dataset to demonstrate the difference between noise and bias and we also introduce the concept of working with image files as input to your analysis. While we do not recommend that you attempt any image processing on the supplied environment, it is important to understand that you are not just restricted to text files or databases as input to your analyses.

**Notebook 3: Geotagging WiFi Access points**
This notebook starts by geotagging access points for a single user in the first section and then moves on to repeat the exercise for a large dataset in the second section. This is the first time that we start working with the full record set and you will need to make sure that you shut down any other notebooks that are not in use to ensure that you have the maximum amount of resources availble to complete the exercise. We iterate through a directory and load the various files, identify just under 700 000 geotags and perform a number of transforms on our dataset.

Please engage with the tutor team in the forums when you get stuck or when you learn something that you think is really useful.

We sincerely hope that you enjoy this course and acquire skills that is of use to you in your personal journey.
