**CURSO**: Análisis Geoespacial, Departamento de Geociencias y Medio Ambiente, Universidad Nacional de Colombia - sede Medellín <br/>
**Profesor**: Edier Aristizábal (evaristizabalg@unal.edu.co) <br />
**Credits**: The content of this notebook is taken from several sources: [Introduction to Python GIS](https://automating-gis-processes.github.io/CSC18/lessons/L1/Intro-Python-GIS.html), Python for geosciences by [Mauricio Cordeiro](https://cordmaur.medium.com/) and the courses and book open and freely published by [Dani Arribas-Bel](http://darribas.org/) -  University of Liverpool & - Sergio Rey - [Center for Geospatial Sciences, University of California, Riverside](http://spatial.ucr.edu). Every effort has been made to trace copyright holders of the materials used in this book. The author apologies for any unintentional omissions and would be pleased to add an acknowledgment in future editions. 

# Computational tools for Geographic Data Science

In this tutorial it is introduced some of the tools we will be working with throughout the course. Although very basic and seemingly abstract, everything showed here will become the basis on top of which we will build more sophisticated tasks.

## Open Source

This course will introduce you to a series of computational tools that make the life of the Data Scientist possible, and much easier. All of them are [open-source](https://en.wikipedia.org/wiki/Open_source), which means the creators of these pieces of software have made available the source code for people to use it, study it, modify it, and re-distribute it. This has allowed a large eco-system that today represents the best option for scientific computing, and is used widely both in industry and academia. Thanks to this, this course can be taught with entirely freely available tools that you can install in any of your computers.

If you want to learn more about open-source and free software, here are a few links:

* **[Video]**: brief [explanation](https://www.youtube.com/watch?v=Tyd0FO0tko8) of open source.
* **[Book]** [The Cathedral and the Bazaar](https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar): classic book, freely available, that documents the benefits and history of open-source software.

## Python

The main bulk of the course relies on the [Python](https://www.python.org/) programming language. Python is a [high-level](https://en.wikipedia.org/wiki/High-level_programming_language) programming language widely used today.

<center><img src="https://www.python.org/static/img/python-logo@2x.png" width="1000"></center>

Python is extremely useful language to learn in terms of GIS since many (or most) of the different GIS Software packages (such as ArcGIS, QGIS, PostGIS etc.) provide an interface to do analysis using Python scripting. During this course, we will mostly focus on doing Geospatial analysis without any third party softwares such as ArcGIS. Why? There are several reasons for doing it using Python without any additional software:

* Everything is free: you don’t need to buy and expensive license for ArcGIS (for example)
* You will learn and understand much more deeply how different geoprocessing operations work
* Python is highly efficient: used for analysing Big Data
* Python is highly flexible: supports all data formats that you can imagine
*  Using Python (or any other open-source programming language) supports open source softwares/codes and open science by making it possible for everyone to reproduce your work, free-of-charge.
* Plug-in and chain different third-party softwares to build e.g. a fancy web-GIS applications as you want (using e.g. GeoDjango with PostGIS as a back-end)

This course uses Python because it has emerged as one of the main and most solid options for Data Science, together with other free alternatives such as R. Python is widely used for data processing and analysis both in academia and in industry. There is a vibrant and growing scientific community, working at both universities and companies, that supports and enhances its capabilities for data analysis by providing new and refining existing extensions (a.ka.a. libraries, see below). In the geospatial world, Python is also very widely adopted, being the selected language for scripting in both [ArcGIS](http://www.esri.com/software/arcgis) and [QGIS](http://qgis.org). All of this means that, whether you are thinking of continuing in Higher Education or trying to find a job in industry, Python will be an importan asset that employers will significantly value.

Being a high-level language means that the code can be "dynamically interpreted", which means it is run on-the-fly without the need to be compiled. This is in contrast to "low-level" programming languages, which first need to be converted into machine code (i.e. compiled) before they can be run. With Python, one does not need to worry about compilation and can just write code, evaluate, fix it, re-evaluate it, etc. in a quick cycle, making it a very productive tool.

The two most popular distributions are [**Anaconda** and **Miniconda**](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda). **Anaconda** is an open source Python distribution that is purpose built for data science, machine learning, and large-scale data processing. It includes the core Python language, over 1,500 data science packages, a package management system called conda, IPython (an interactive Python interpreter), and much more. While it is a very comprehensive distribution, it is also quite large and therefore can take a while to download and consumes a lot of disk space. **Miniconda** on the other hand, is a slimmed down version of Anaconda and includes all of the same components except for the pre-installed 1,500 data science packages. Instead, we can simply install these packages individually as needed using conda (the Anaconda/Miniconda package manager).

<center><img src="https://sf.ezoiccdn.com/ezoimgfmt/linuxnetmag.com/wp-content/uploads/2020/11/MinicondavsAnaconda.jpg?ezimgfmt=ng%3Awebp%2Fngcb1%2Frs%3Adevice%2Frscb1-1" width="800"></center>

Choose **Anaconda** if you:

* Are new to conda or Python.
* Like the convenience of having Python and over 1,500 scientific packages automatically installed at once.
* Have the time and disk space---a few minutes and 3 GB.
* Do not want to individually install each of the packages you want to use.

Choose **Miniconda** if you:

* Do not mind installing each of the packages you want to use individually.
* Do not have time or disk space to install over 1,500 packages at once.
* Want fast access to Python and the conda commands and you wish to sort out the other programs later.

Similar to how there are many different Python distributions, there are also several different package managers available. Despite having several options, most package managers function in the same manner and use a simple command structure. Each Python distribution is usually bundled with a specific package manager. Some of the more common ones are pip and conda.

Below is a list of useful libraries (and links to their docs) that helps you get going when doing data analysis or GIS in Python. If you are interested or when you start using these modules in your own work, it is highly recommended to read the documentation from the web pages of the module that you use:

### Data analysis & visualization:
* Numpy –> Fundamental package for scientific computing with Python
* Pandas –> High-performance, easy-to-use data structures and data analysis tools
* Scipy –> A collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization and statistics
* Statsmodels –> Statistical models for Python
* Scikit-learn –> Machine learning for Python (classification, regression, clustering, etc.)
* Matplotlib –> Basic plotting library for Python
* Seaborn –> Statistical data visualization
* Bokeh –> Interactive visualizations for the web (also maps)
* Plotly –> Interactive visualizations (also maps) for the web (commercial - free for educational purposes)
* Dash –> Building analytical web applications with Python (no Javascript required)

### GIS:
* GDAL –> Fundamental package for processing vector and raster data formats (many modules below depend on this). Used for raster processing.
* Geopandas –> Working with geospatial data in Python made easier, combines the capabilities of pandas and shapely.
* Shapely –> Python package for manipulation and analysis of planar geometric objects (based on widely deployed GEOS).
* Fiona –> Reading and writing spatial data (alternative for geopandas).
* Pyproj –> Performs cartographic transformations and geodetic computations (based on PROJ.4).
* PyCRS –> Working eaily with different CRS specifications (EPSG, ESRI, Proj4)
* Pysal –> Library of spatial analysis functions written in Python.
* Geopy –> Geocoding library: coordinates to address <-> address to coordinates.
* GeoViews –> Interactive Maps for the web.
* Geoplot –> High-level geospatial data visualization library for Python.
* GeoNotebook –> Desktop GIS-like environment for visualizing and interacting with spatial data using Python (based on Jupyter Notebooks)
* OSMnx –> Python for street networks. Retrieve, construct, analyze, and visualize street networks from OpenStreetMap
* Networkx –> Network analysis and routing in Python (e.g. Dijkstra and A* -algorithms), see this post.
* Cartopy –> Make drawing maps for data analysis and visualisation as easy as possible.
* Scipy.spatial –> Spatial algorithms and data structures.
* Rtree –> Spatial indexing for Python for quick spatial lookups.
* Rasterio –> Clean and fast and geospatial raster I/O for Python.
* Rasterstats –> A module for summarizing geospatial raster datasets based on vector geometries (e.g. conduct zonal statistics).
* RSGISLib –> Remote Sensing and GIS Software Library for Python.

<center><img  src="https://www.alpha-quantum.com/blog/wp-content/uploads/2020/05/img_5ecd875868599.png" width="1000" class="center"></center>

### Installing Python via Anaconda or Anaconda

It is highly recommend that you install the [Anaconda Python Distribution](https://docs.continuum.io/anaconda/install/). It will make your life so much easier. You can download and install Anaconda on Windows, OSX and Linux.

<center><img src="https://miro.medium.com/max/700/1*TaL5qaSnR2QOef69e49LPQ.png" width="1000"></center>

If you prefer [Miniconda](https://docs.conda.io/en/latest/miniconda.html), you will need to download and install Miniconda. **Miniconda** no tiene una interface para el usuario como Anaconda, debe ser a travez de la línea de comando.

<center><img src="https://katiekodes.com/images/screenshot-miniconda-02-execute.png" width="700"></center>

Next, the installer is going to ask you whether you want to do either of two things.

* Add “Anaconda” to my “PATH” environment variable. It says not to do it right in the installer text, I recommend to check it.
* Register “Anaconda” as my “default Python 3.7” environment. This comes checked, and they recommend it.

Both Anaconda and Miniconda come with Conda. And because Conda is a package manager, what you can accomplish with Anaconda, you can do with Miniconda. In other words, the steps in the Miniconda section (creating a custom environment with Conda) will work after you've gone through the Anaconda section.

### Without Anaconda

If you prefer to install Python without Anaconda or Miniconda go to the [Python Releases](https://www.python.org/downloads/) page and download the latest stable release executable installer. After the download is complete, run the installer. On the first page of the installer, be sure to select the “Add Python to PATH” option and click through the remaining setup steps leaving all the pre-select installation defaults.

<center><img src="https://docs.blender.org/manual/es/2.79/_images/about_contribute_install_windows_installer.png" width="1000"></center>

In this case, for installing Jupyter or any other package you need to use pip. [Pip](https://pypi.org/project/pip/) is a package manager that is specifically designed to install Python packages exclusively. In contrast, Conda is an open-source installer and package-management tool that can also handle both Python and non-Python library dependencies. 

Conda offers virtual environment capabilities and can run on multiple operating systems like Windows, Linux, and macOS. With Conda, you will be able to create, load, save, and switch between different environments. 

```
pip install -c conda-forge jupyterlab
```

This line download and install the Jupyter Notebook package. Once complete, we can check that Jupyter Notebook was successfully installed by running jupyter notebook from a Terminal (Mac) / Command Prompt (Windows). This will startup the Jupyter Notebook server, print out some information about the notebook server in the console, and open up a new browser tab at http://localhost:8888.

<center><img src="https://miro.medium.com/max/2400/1*dAZ5etmJphQQCP-ys6Wd_Q.png" width="800"></center>

## Starting Jupyter Lab

The main computational tool you will be using during this course is [Jupyter Lab](http://jupyter.org/). Jupyter Lab is an interactive web interface for code development, which follows the concept known as REPL (Read — Evaluate — Print — Loop), that is widely used by data scientists. The great advantage of using a REPL environment is that we can develop our code in a gradual manner, executing command by command and checking its results. Additionally, we can keep explanatory text together with the code that we are going to develop and also view the results, all in the same environment without need to alternate between the command line and other applications, such as image viewer or others. Notebooks are a convenient way to thread text, code and the output it produces in a single file that you can then share, edit and modify. You can think of notebooks as the Word document of Data Scientists.

A notebook comprises a single file that stores narrative text, computer code, and the output produced by code. Storing both narrative and computational work in a single file means that the entire workflow can be recorded and documented in the same place, without having to resort to ancillary devices (like a paper notebook). A second feature of notebooks is that they allow for interactive work. Modern computational work benefits from the ability to try, fail, tinker, and iterate quickly until a working solution is found. Notebooks embody this quality and enable the user to work interactively. Whether the computation takes place on a laptop or on a data center, notebooks provide the same interface for interactive computing, lowering the cognitive load require to scale up. Third, notebooks have interoperability built in. The notebook format is designed for recording and sharing computational work, but not necessarily for other stages of the research cycle. To widen the range of possibilities and applications, notebooks are designed to be easily convertible into other formats. For example, while a specific application is required to open and edit most notebook file formats, no additional software is required to convert them into pdf files that can be read, printed, and annotated without the need of technical software.

A Jupyter notebook is a plain text file with the **.ipynb** extension, which means that it is an easy file to move around, sync, and track over time. Internally, it is structured as a plain-text document containing JavaScript Object Notation that records the state of the notebook, so they also integrate well with a host of modern web technologies. The atomic element that makes up a notebook is called a cell. Cells are consistent chunks of content that contain either text or code. In fact, a notebook can be thought of as an ordered collection of cells. Cells can be of two types: text and code.

### Creating an environment and installing packages

Before starting with Jupyter Lab, it is recommended to create an environment for spatial analysis. Environments con be considered repositories in which Python packages are installed to in order to avoid conflicts between packages and versions. For example, if you have code developed using a NumPy 1.18 package and this code does not work with the current version which is 1.20, you can (and should) create specific environments for each version. 

Conda comes initially with a standard environment called base (root). Before installing our packages we will create a new environment, by clicking on Environments tab on the left and the Create button below the list of environments. To this new environment, we can give any name such as “geospatial” and select Python 3.7 as the main package.

<center><img src="https://miro.medium.com/max/700/1*EEcFGtMxuu42T42jgmvSzA.png" width="1000"></center>

To install a new package, it is necessary to search for the desired package name using the search bar (don’t forget to select All from the dropdown box), select the package from the list and click Apply.  As the packages work with dependencies, depending on the package to be installed it will install all the necessary packages that don’t exist yet in the current environment.

<center><img src="https://miro.medium.com/max/700/1*2BZZhxcof1eu1INPMyny6w.png" width="1000"></center>

To write and execute our code we will use Jupyter Lab, which is an evolution of Jupyter Notebook, with some additional functionalities. With the new environment selected, we should go back to Homeand click Install, just below the Jupyter Lab icon.

<center><img src="https://miro.medium.com/max/700/1*5pGWfU63Hf_nW-NKQ5gTWw.png" width="800"</center>

Si tiene instalado Miniconda (o tambien puede realizarse bajo el ambiente Anaconda) debe entonces dirigirse a la linea de comandos con la tecla Windows + R, o en start buscar por cmd. De click y se abrirá el command prompt. Utilice el siguiente comando para activar el ambiente
```
Conda activate
```

Este comando lo dirige al ambiente base. Para crear un ambiente llamado "geo" donde instalará sus paquetes utilice el siguiente comando:
```
conda create --name geo
```

En realidad puede darle el nombre al ambiente que prefiera. Para activar el ambiente creado utilice:
```
conda activate geo
```

Ya le debe aparecer el command prompt pero en el ambiente geo. Para instalar las librerias que desee en este utilice el comando:
```
conda install pandas
```
Then launch `Jupyter` by typing on the same terminal:

```
jupyter Lab
```

Navigate until the folder where you have placed the Notebook file and click on it. This will open the notebook on a different tab. You are now on the interactive version of the notebook!

When you are finished with the session, you can save the notebook with `File -> Save Notebook`. Everything you do on the notebook (text, code and output) is saved into an `.ipynb` file that you can open later, share, etc.

## Add virtual environment to Jupyter

Jupyter Notebook makes sure that the IPython kernel is available, but you have to manually add a kernel with a different version of Python or a virtual environment. First, make sure your environment is activated with conda activate myenv. Next, install ipykernel which provides the IPython kernel for Jupyter:

```
pip install --user ipykernel
```

Next you can add your virtual environment to Jupyter by typing:

```
python -m ipykernel install --user --name=myenv
```

After you deleted your virtual environment, you’ll want to remove it also from Jupyter. Let’s first see which kernels are available. You can list them with:
```
jupyter kernelspec list
```
This should return kernel env in jupyter. Now, to uninstall the kernel, you can type:
```
jupyter kernelspec uninstall myenv
```

## Notebooks

The main building block of notebooks are cells. These are chunks of the same time of content which can be cut, pasted, and moved around in a notebook. Cells can be of two types:

* **Text**, like the one where this is written.
* **Code**, like the following one below:

In [1]:
# This is a code cell

The notebook allows to run several commands through the "Command Palette", which can be found on the third tab of the left pane:


For example, you can create a new cell by searching for "Insert Cell". By default, this will be a code cell, but you can change that on the `Cell` -> `Cell Type` menu. Choose `Markdown` for a text cell. Once a new cell is created, you can edit it by clicking on it, which will create the cursor bar inside for you to start typing.

**Pro tip!**: cells can also be created with shortcuts. If you press `<escape>` and then `b` (`a`), a new cell will be created below (above). There is a whole bunch of shortcuts you can explore by pressing `<escape>` and `h` (press `<escape>` again to leave the help).

A particularly useful feature of notebooks is that you can save, in the same place, the code you use to generate any output (tables, figures, etc.). As an example, the cell below contains a snipet of Python that returns a printed statement. This statement is then printed below and recorded in the notebook as output:

In [None]:
print("Hello world!!! otra vez")
print('otra vez')

Note also how the notebook has automatic syntax highlighting support for Python. This makes the code much more readable and understandable. More on Python below.

### Markdown

Text cells in a notebook use the [Github Flavored Markdown](https://help.github.com/articles/github-flavored-markdown/) markup language. This means you can write plain text with some rules and the notebook renders a more visually appealing version of it. Markdown is a popular set of rules to create rich content (e.g. headers, lists, links) from flat, plain text files without being as complex and sophisticated as other typesetting approaches. The notebook will then render markdown automatically. For more demanding or specific tasks, text cells can further integrate LATEX notation. This means we can write most forms of narrative relying on markdown, which is more straightforward, and rely on LATEX for more sophisticated parts, such as equations. Covering Markdown rules in detail is beyond the scope of this chapter, but the interested reader can inspect the [official Github specification](https://docs.github.com/en/github/writing-on-github/basic-writing-and-formatting-syntax) of the so-called Github-flavored markdown, the one adopted by the notebook.

Let's see some examples:

* **BOLD**:

`This is **bold**.`

Is rendered:

This is **bold**.

* **ITALIC**:

`This is *italic*.`

Is rendered:

This is *italic*.

* **LISTS**:

You can create unnumbered lists:

```
* Item 1
* Item 2
* ...
```

Which will produce:

* Item 1
* Item 2
* ...

Or you can create numbered lists:

```
1. First element
1. Second element
1. ...
```

And get:

1. First element
1. Second element
1. ...

Note that you don't have to write the actual number of the element, just using `1.` always produces a numbered list.

You can also nest lists:

```
* First unnumbered element, which can be split into:

    1. One numbered element
    2. Another numbered element

* Second element.
* ...
```

* First unnumbered element, which can be split into:

    1. One numbered element
    2. Another numbered element

* Second element.
* ...

This creates many oportunities to combine things nicely.

* **LINKS**

`You can easily create hyperlinks, for example to [WikiPedia](https://www.wikipedia.org/). `

You can easily create hyperlinks, for example to [WikiPedia](https://www.wikipedia.org/).

* **HEADINGS**: including `#` before a line causes it to render a heading.

---

`# This is Header 1`

Turns into:

# This is Header 1

---

`## This is Header 2`

Turns into:

## This is Header 2

---

`### This is Header 3`

Turns into:

### This is Header 3

And so on...

---

You can see a more in detail introduction in the following links:

> https://guides.github.com/features/mastering-markdown/

> https://help.github.com/articles/markdown-basics/

> https://help.github.com/articles/github-flavored-markdown/

### Rich content in a notebook

Notebooks can also include rich content from the web. For that, we need to import the `display` module:

In [2]:
import IPython.display as display

This makes available additional functionality that allows us to embed rich content. For example, we can include a YouTube clip easily by passing it's ID:

In [7]:
display.YouTubeVideo('iinQDhsdE9s')

Or we can pass standard HTML code:

In [8]:
display.HTML("""<table>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>""")

Header 1,Header 2
"row 1, cell 1","row 1, cell 2"
"row 2, cell 1","row 2, cell 2"


Note that this opens the door for including a large number of elements from the web, as an `iframe` is also allowed. For example, interactive maps can be included:

In [3]:
osm = """
<iframe width="425" height="350" frameborder="0" scrolling="no" marginheight="0" marginwidth="0" src="http://www.openstreetmap.org/export/embed.html?bbox=-2.9662737250328064%2C53.400500637844594%2C-2.964626848697662%2C53.402550738394034&amp;layer=mapnik" style="border: 1px solid black"></iframe><br/><small><a href="http://www.openstreetmap.org/#map=19/53.40153/-2.96545">View Larger Map</a></small>
"""
display.HTML(osm)

Or sound content:

In [3]:
sound = '''
<iframe width="100%" height="300" scrolling="no" frameborder="no" allow="autoplay" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/327114763&color=%23ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&show_teaser=true&visual=true"></iframe>'''
display.HTML(sound)

## Docker container

Containers are a lightweight version of a virtual machine, which is a program that enables an entire operating system to run compartimentalised on top of another operating system. Containers allow to encapsulate an entire environment (or platform) in a format that is easy to transfer and reproduce in a variety of computational contexts. The most popular technology for containers nowadays is Docker, and the opportunities that it provides to build transparent and transferrable infrastructure for data science are starting to be explored.

<center><img src="https://miro.medium.com/max/2400/1*JUOITpaBdlrMP9D__-K5Fw.png" width="1000"></center>

Docker allows us to create a "container" that includes all the tools required to access the content of the book interactively. But, what exactly is a container? There are several ways to describe it, from very technical to more intuitive ones. In this context, we will focus on a general understanding rather than on the technical details behind its magic. One can think of a container as, well, a "box" that includes everything that is required to run a certain set of software. This box can be moved around, from machine to machine, and the computations it executes will remain exactly the same. In fact, the content of the box remains exactly the same, bit by bit. When we download a container into a computer, be it a laptop or a data center, we are not performing an install of the software it contains from the usual channels, for the platform on which we are going to run it on. Instead, we are downloading the software in the form that was installed when the container was originally built and packaged, and for the operating system that was also packaged originally. This is the real advantage: build once, run everywhere. For the experienced reader, this might sound very much like their older syster: virtual machines. Although there are similarities between both technologies, containers are more lightweight and can be run much more swiftly than virtual machines. This box that is isolated interacts with the rest of the computer through several links that connect the two. In the case of this book, since JupyterLab is a client-server application, the server runs inside the container and we connect to it through two main "doors": one, through the browser, we will access the main Lab interface; and two, we will "mount" a folder inside the container so we can use software inside the container to edit files that are stored outside in the host machine.

<center><img src="https://www.saagie.com/wp-content/uploads/2019/07/2-1024x251.png" width="1000"></center>

This is the recommended approach if you meet the following requirements:

- [x] You have admin rights over your machine
- [x] You are running either Windows 10 Pro, macOS, or Linux

In that case, Docker is the preferred alternative. It provides a stable platform to run complex software setups like that required in this context. Docker is a containerisation technology that allows to run pre-packaged (containerised) software under controlled environments. Relying on [Docker](https://github.com/darribas/gds_env), the gds_env project provides a containerised platform for Geographic Data Science.

The steps to install this (given you meet the requirements above) include:

- Obtain a copy of Docker and install it:
    - Windows10 Pro/Enterprise: Install Docker Desktop for Windows
    - macOS: Get started with Docker Desktop for Mac
- Open a terminal or shell. How to do this will depend on your operating system:

    - Windows: we recommend PowerShell. Type "PowerShell" on the startup menu and, when it comes up, hit enter. This will open a terminal for you.
    - macOS: use the Teminal.app. You can find it on the Applications folder, within the Utilities subfolder.
    - Linux: if you are running Linux, you probably already have a terminal application of preference. Almost any Linux distribution comes with a terminal or shell app built in.
    
Download, or "pull", our [container](https://hub.docker.com/r/darribas/gds/). For this run on the terminal the following command:

<center><img src="https://geekflare.com/wp-content/uploads/2020/04/docker-hub-repo.png" width="1000"></center>

```
  docker pull darribas/gds:3.0
```

- Once the command above has finished installing your GDS stack, you are ready to go! To get a Jupyter session started, you can follow these steps:
     - Run on the same terminal as above the following command:

```
 docker run --rm -ti -p 8888:8888 -v {PWD}:/home/jovyan/work darribas/gds:3.0
```
- docker run: Docker does a lot of things, to communicate that we want to run a new container, we need specify it.
- --rm: this flag will ensure the container is removed when you close it. This in turn makes sure every time you run it again, you start afresh with the exact same set up.
- ti: this flag further ensures that the container is not run in the background but in an _i_nteractive mode.
- p 8888:8888: with this, we ensure we forward the port from inside the container out to the host machine (i.e. your laptop). This step is crucial because it allows us to interact with the server and for Jupyter to "send" JupyterLab across so we can access it in our web browser.
- v {PWD}:/home/jovyan/work: similarly, this flag "mounts" the folder from where the command is being run in the terminal ({PWD}) into the container so it is visible and editable from inside the container. Such folder will be available at the container's work folder.
- gdsbook/stack: finally, we also need to specify which image we want to run. In this case, we run the image created for this book.

This will start a Python session, please do not quite the window until you are done using Python!

    - Open your favorite browser (preferably Firefox or Chrome) and point it to localhost:8888
    - You will be asked for a password or a token. To access the lab, copy the token from the terminal (in the example above, that would be ae7e8017f3e97658a218ec2c2d1fbcc894f09d80f6b5f79c), enter it on the box and click on "Log in". Now you are in!



## Intro to Python

The standard Python language includes some data structures (e.g. lists, dictionaries, etc. See below) and allows many basic operations (e.g. sum, product, etc.). For example, right out of the box, and without any further action needed, you can use Python as a calculator:

In [12]:
3 + 5

8

In [None]:
2. / 3

: 

In [14]:
(3 + 5) * 2. / 3

5.333333333333333

However, the strength of Python as a data analysis tool comes from the extensions provided separately that add functionality and provide access to much more sophisticated data structures and functions. These come in the form of packages, or libraries, that once installed need to be imported into a session.

In this course, we will be using many of the core libraries of what has been called the "PyData stack", the set of libraries that make Python a full-fledge system for Data Science. We will introduce them gradually as we need them for particular tasks but, for now, let us have a look at the foundational library, [numpy](http://www.numpy.org/) (short for numerical Python). Importing it involves the following line of code:

In [1]:
import numpy as np # we rename it in the session as `np` by convention

Note how we import it *and* rename it in the session, from `numpy` to `np`, which is shorter and more convenient.

Note also how comments work in Python: everything in a line *after* the `#` sign is ignored by Python when it evaluates the code. This allows you to insert comments that Python will ignore but that can help you and others better understand the code.

Once imports are out of the way, let us start exploring what we can do with `numpy`. One of the easiest tasks is to create a sequence of numbers:

In [16]:
seq = np.arange(10)
seq

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

The first thing to note is that, in line 1, we create the sequence by calling the function `arange` and assign it to an object called `seq` (it could have been called anything else, pick your favorite) and, in line 2, we have it printed as the output of the cell.

Another interesting feature is how, since we are calling a `numpy` function called `arange` by adding `np.` in front. This is to note that the function comes explicitly from `numpy`. To find out how necessary this is, you can try generating the sequence without `np`:

In [17]:
# NOTE: comment out to run the cell
#seq = arange(10)

What you get instead is an error, also called a "traceback". In particular, Python is telling that it cannot find a function named `arange` in the core library. This is because that particular function is only available in `numpy`.

### Variables

A basic feature of Python is the ability to assign a name to different "things", or objects. These can also be called sometimes "variables". We have already seen that in the example above but, let us make it more explicit. For example, an object can be a single number:

In [18]:
a = 3

Or a name, also called "string":

In [19]:
b = 'Hello World'

You can check what type an object is also easily:

In [20]:
type(a)

int

`int` is short for "integer" which, roughly speaking, means an whole number. If you want to save a number with decimals, you will be using floats:

In [21]:
c = 1.5
type(c)

float

As mentioned, what we understand as letters in a wide sense (spaces and other signs count too) is called "strings" (`str` in short):

In [22]:
type(b)

str

### Help

A very handy feature of Python is the ability to access on-the-spot help for its different functions. This means that you can check what a function is supposed to do, or how to access it, right inside your Python session. Of course, this also works handsomely inside a notebook. There are a couple of ways to access the help. 

Take the `numpy` function `arange` that we have used above. The easiest way to check interactively how to use it is by:

In [23]:
np.arange?

[0;31mDocstring:[0m
arange([start,] stop[, step,], dtype=None)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range` function, but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not
be consistent.  It is better to use `numpy.linspace` for these cases.

Parameters
----------
start : number, optional
    Start of interval.  The interval includes this value.  The default
    start value is 0.
stop : number
    End of interval.  The interval does not include this value, except
    in some cases where `step` is not an integer and floating point
    round-off affects the length of `out`.
step : number, optional
    Spacing between values.  For any output `out`, this is the distance
    between two adjacent values,

As you can see, this brings up a sub-window in the browser with all the information you need.

If, for whatever reason, you needed to print that info into the notebook itself, you can do the following:

In [24]:
help(np.arange)

Help on built-in function arange in module numpy:

arange(...)
    arange([start,] stop[, step,], dtype=None)
    
    Return evenly spaced values within a given interval.
    
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range` function, but returns an ndarray rather than a list.
    
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use `numpy.linspace` for these cases.
    
    Parameters
    ----------
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off affects the length of `out`.
   

### Control flow (a.k.a. `for` loops and `if` statements)

Although this does not intend to be a comprehensive introduction to computer programming or general purpose Python (check the references for that, in particular Allen Downey's [book](http://www.greenteapress.com/thinkpython/thinkpython.html)), it is important to be aware of two building blocks of almost any computer program: `for` loops and `if` statements. It is possible that you will never require them for this course, as all that is used here is based on existing methods and functions, but it is always useful to know they exist and to be able to recognize them. They can also come in very handy in cases where you some extra functionality out of standard methods. Without further ado, let us have a look and the two single most relevant tools of computer programming.

* `for` loops

These allow  you to repeat a particular action or task over a sequence. As an example, you can print your name ten times without having to type it yourself every single time:

In [3]:
for i in np.arange(10):
    print('my name')

my name
my name
my name
my name
my name
my name
my name
my name
my name
my name


Note a couple of features in the loop:

1. You loop *over* a sequence, in this particular case the sequence of ten numbers created by `np.arange(10)`.
1. In every step, for every element of the sequence in this case, you repeat an action. Here we are printing the same text, `my name`.
1. Although not used in this loop, each of the elements you loop over can be accessed inside the loop. This can be irrelevant, as in the loop above, or extremely useful, it depends on the context. For example, see a case where you use the value of the sequence in each step:

In [26]:
for i in np.arange(10):
    print("I am at step ", i)

I am at step  0
I am at step  1
I am at step  2
I am at step  3
I am at step  4
I am at step  5
I am at step  6
I am at step  7
I am at step  8
I am at step  9


One more note: for convention, we are calling the element of the sequence `i`, but this could be named anything. In fact, in many cases, more meaningful names make code much more readable. For example, you could think of a re-write of the loop above as:

In [27]:
for step in np.arange(10):
    print("I am at step ", step)

I am at step  0
I am at step  1
I am at step  2
I am at step  3
I am at step  4
I am at step  5
I am at step  6
I am at step  7
I am at step  8
I am at step  9


* `if` statements

We have just seen how `for` loops allow you to repeat an action over a sequence. In the case of `if` statements, these allow you to select or restrict such actions to only those cases that meet a condition(s) you specify in the statement.

For example, if you think of the loops written above, you might want to only print those that are odd, skipping those that are even:

In [28]:
for i in np.arange(10):
    if i%2:
        print(i)

1
3
5
7
9


Ignore for the moment the part `i%2`, just remember this is one way Python has to check if a number is odd. The important bit in this loop, as compared to the less complex one above, is that we are using an `if` statement to select only those candidates that meet the condition. In other words, what we are doing it looping over every number in the sequence from zero to nine (`for i in np.arange(10)`) and checking if they are even or odd (`if i%2`). If they meet the condition, they are odd, then we proceed and print them on the screen.

A full `if` statement also allows for an action to be taken if the original condition is not satisfied. This is called an "ifelse" statement. For example, you can think of a loop that prints the type of each number in a sequence:

In [29]:
for i in np.arange(10):
    # Check if it is odd
    if i%2:
        print(i, ' is odd')
    # If not odd (even), then do the following
    else:
        print(i, ' is even')

0  is even
1  is odd
2  is even
3  is odd
4  is even
5  is odd
6  is even
7  is odd
8  is even
9  is odd


### Data structures

The standard python you can access without importing any additional libraries contains a few core data structures that is very handy to know. Most of data analysis is done on top of other structures specifically designed for the purpose (numpy arrays and pandas dataframes, mostly. See the following sessions for more details), but some understanding of these core Python structures is very useful. In this context, we will look at three: values, lists, and dictionaries.

* **Values**: these are the most basic elements to organize data and information in Python. You can think of them as numbers (integers or floats) or words (strings). Typically, these are the elements that will be stored in lists and dictionaries.

An integer is a whole number:

In [30]:
i = 5
type(i)

int

A float is a number that allows for decimals:

In [31]:
f = 5.2
type(f)

float

Note that a float can also not have decimals and still be stored as such:

In [32]:
fw = 5.
type(fw)

float

However, they are different representations:

In [33]:
f == fw

False

* **Lists**: a list is an ordered sequence of values that can be of mixed types. They are represented between squared brackets (`[]`) and, although not very efficient in memory terms, are very flexible and useful to "put things together".

For example, the following list of integers:

In [34]:
l = [1, 2, 3, 4, 5]
l

[1, 2, 3, 4, 5]

In [35]:
type(l)

list

Or the following mixed one:

In [36]:
m = ['a', 'b', 5, 'c', 6, 7]
m

['a', 'b', 5, 'c', 6, 7]

Lists can be queried and sliced. For example, the first element can be retrieved by:

In [37]:
l[0]

1

Or the second to the fourth:

In [38]:
m[1:4]

['b', 5, 'c']

Lists can be added:

In [39]:
l + m

[1, 2, 3, 4, 5, 'a', 'b', 5, 'c', 6, 7]

New elements added:

In [40]:
l.append(4)
l

[1, 2, 3, 4, 5, 4]

Or modified:

In [41]:
l[1]

2

In [42]:
l[1] = 'two'
l[1]

'two'

In [43]:
l

[1, 'two', 3, 4, 5, 4]

* **Dictionaries**: dictionaries are unordered collections of "keys" and "values". A key, which can be of any kind, is the element associated with a "value", which can also be of any kind. Dictionaries are used when order is not important but you need fast and convenient lookup. They are expressed in curly brackets, with keys and values being linked through columns.

For example, we can think of a dictionary to store a series of names and the ages of the people they represent:

In [44]:
ages = {'Ana': 24, 'John': 20, 'Li': 27, 'Ivan': 40, 'Tali':33}
ages

{'Ana': 24, 'John': 20, 'Li': 27, 'Ivan': 40, 'Tali': 33}

In [45]:
type(ages)

dict

Dictionaries can then be queried and values retrieved easily by using their keys. For example, if we quickly want to know Li's age:

In [46]:
ages['Li']

27

Similarly to lists, you can modify and assign new values:

In [47]:
ages['Juan'] = 73
ages

{'Ana': 24, 'John': 20, 'Li': 27, 'Ivan': 40, 'Tali': 33, 'Juan': 73}

Using this property, you can create entirely empty dictionaries and populate them later on:

In [48]:
newdict = {}
newdict['key1'] = 1
newdict['key2'] = 2
newdict

{'key1': 1, 'key2': 2}

### Functions

The last part of this whirlwind tour on Python relates to functions, or more properly termed, methods. The motivation is that, so far, we have only seen how you can create Python code that, if you want to run again somewhere else, you need to copy and paste entirely. However, as we will see in more detail later in the course, one of the main reasons why you want to use Python for data analysis, instead of a point-and-click graphical interface like SPSS, for instance, is that you can easily reuse code and re-run analyses easily. Methods help us accomplish this by encapsulating snippets of code that perform a particular task and making them available to be called.

We have already *used* methods here. When we call `np.arange`, we are using one of them. Now, we will see how to *create* a method of our own that performs the specific task we want it to do. For example, let us create a method to reproduce the first loop we created above:

In [49]:
def run_loop():
    for i in np.arange(10):
        print(i)
    return None

Already with this method, there is a bunch of interesting things going on:

* First, note how we define a bit of code is a method, as oposed to plain Python: we use `def` followed by the name of our function (we have chosen `run_loop`, but we anything could have done).
* Second, we append `()` after the name, and finish the line with a colon (`:`). This is necessary and will allow us to specify requirements for the function (see below).
* Third, realize that everything inside a function needs to be indented. This is a core property of Python and, although some people find it odd, it enhances readibility greatly.
* Fourth, the piece of code to do the task we want, printing the sequence of numbers, is inside the function in the same way it was outside, only properly indented.
* Fifth, we finish the method with a line starting by `return`. In this case, we follow it with `None`, but this will change as methods become more sophisticated. Essentially, this is the part of the method where you specify which elements you want it to return and save for later use.

Once we have paid attention to these elements, we can see how the method can be *called* and hence the code inside it executed:

In [50]:
run_loop()

0
1
2
3
4
5
6
7
8
9


This is the same way that we called `np.arange` before. Note how we do not include the colon (`:`) but simply use the name of the method followed by the parenthesis.

This method only involves a limited degree of complexity: you do not require any input, and the code produces and output (the printout) but it is not saved anywhere. The rest of this section relaxes these two aspects to allow us to build more complex, but also more useful, methods.

First, you can specify "arguments" to be passed that modify the behaviour of the method. Remember how we called `np.arange` with a number that implied the length of the sequence we wanted returned. We can do the same thing in our own function. The main aspect to pay attention to in this context is that the arguments need to be variables, not particular values. Let us see a modified example of our method:

In [51]:
def run_loopX(x):
    for i in np.arange(x):
        print(i)
    return None

We have replaced the fixed length of the sequence (10) by a variable named `x` that allows us to specify *any value we want* when we call the method:

In [52]:
run_loopX(3)

0
1
2


In [53]:
run_loopX(2)

0
1


Another way you can build more flexibility into a method is by allowing it to return an output of the computation. In the previous examples, the function performs a computation (i.e. printing values on the screen), but it does not return any value. This is in contrast with, for example, `np.arange` which does return an output, the sequence of values:

In [54]:
a = np.arange(10)

In [55]:
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Our function does not save anything:

In [56]:
b = run_loopX(3)

0
1
2


In [57]:
b

We can modify this using the last line of a method. For example, let us assume we want to return a sequence as long as the series of numbers we print on the screen. The method should be:

In [58]:
def run_loopXout(x):
    for i in np.arange(x):
        print(i)
    return np.arange(x)

Note the main difference: instead of returning `None`, we are telling Python to return a sequence, which has the same length as the one used to specify the loop. Now, there is an alternative way of being more efficient in this method, and that is assigning the sequence to a new object and using it when necessary later on. The results are exactly the same, but there are less computations performed and, more critically, we minimize the chances of making mistakes.

In [59]:
def run_loopXout(x):
    seq = np.arange(x)
    for i in seq:
        print(i)
    return seq

Either of these two new versions of the method return an output:

In [60]:
a = run_loopX(3)
b = run_loopXout(3)

0
1
2
0
1
2


In [61]:
a

In [62]:
b

array([0, 1, 2])

The advantage of methods, as oposed to straight code, is that they force us to think in a modular way, helping us identify exactly what it needs to be done, in what order, and what it is required. Encapsulating these atomic bits of functionality in methods allows us to write things once and flexibly use them everywhere, saving us time (and headaches) in the long run.

A final note on functions. It is important that, whenever you create a function, you include some documentation about what it expects, what it does, and what it returns. Although there are many ways of doing this, the typical convention is as follows:

In [63]:
def run_loopXout(x):
    """
    Print out the values of a sequence of certain length
    ...
    
    Arguments
    ---------
    x     : int
            Length of the sequence to be printed out
    
    Returns
    -------
    seq   : np.array
            Sequence of values printed out
    """
    seq = np.arange(x)
    for i in seq:
        print(i)
    return seq

Documentation, as any string, are highlighted in red on a notebook. Let us have a look at the structure and components of a well-made documentation (also called "docstring"):

* It is encapsulated between triple commas (`"""`).
* Begins with a short description of what the method does. The shorter the better, the more concise, the even better.
* There is a section called "Arguments" that lists the elements that the function expects. 
* Each argument is then listed, followed by its type. In this case it is an object `x` that, as we are told, needs to be an integer.
* The arguments are followed by another section that specifies what the function returns, and of what type the output is.

Documentation in this way is very useful to remember what a function does, but also to force yourself to write clearer code. A bonus is that, if you include documentation in this way, it can be checked with the standard `help` or `?` systems reviewed above:

In [64]:
run_loopXout?

[0;31mSignature:[0m [0mrun_loopXout[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Print out the values of a sequence of certain length
...

Arguments
---------
x     : int
        Length of the sequence to be printed out

Returns
-------
seq   : np.array
        Sequence of values printed out
[0;31mFile:[0m      ~/host/content/labs/<ipython-input-63-5ed12024ff6b>
[0;31mType:[0m      function


In [65]:
help(run_loopXout)

Help on function run_loopXout in module __main__:

run_loopXout(x)
    Print out the values of a sequence of certain length
    ...
    
    Arguments
    ---------
    x     : int
            Length of the sequence to be printed out
    
    Returns
    -------
    seq   : np.array
            Sequence of values printed out



## Javascript
[JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript) is the world's most popular programming language, because of it is a fun and flexible programming language. It’s one of the core technologies of web development and can be used on both the front-end and the back-end. Do not confuse JavaScript with the Java programming language. Both "Java" and "JavaScript" are trademarks or registered trademarks of Oracle in the U.S. and other countries. However, the two programming languages have very different syntax, semantic, and use.

<center><img src="https://www.tutorialrepublic.com/lib/images/javascript-illustration.png" width="1000"></center>

The two most important cloud-plataform, Google Earth Engine and Sentinel Hub, uses [javascripts](https://javascript.info/intro).

### Google Earth Engine (GEE)
Google Earth Engine (GEE) is a cloud-based platform that enables large-scale scientific analysis
and visualization of geospatial data sets. GEE was launched in 2010 by Google as a proprietary system.
Currently, it is available to users as a free service for small and medium workloads, using a business
model similar to the other cloud-based services of the company.
This platform is built from a collection of technologies available on Google’s infrastructure,
such as the large-scale computer cluster management system (Borg), the distributed databases (Bigtable
and Spanner), the distributed file system (Colossus) and the parallel pipeline execution framework
FlumeJava


<center><img src="https://images.squarespace-cdn.com/content/v1/5344b442e4b058c292aa490c/1586648476995-V9L18VS4UZTI6LIBMA31/ke17ZwdGBToddI8pDm48kGLSsGHlYP8a-s1Bq8jyLU17gQa3H78H3Y0txjaiv_0fDoOvxcdMmMKkDsyUqMSsMWxHk725yiiHCCLfrh8O1z4YTzHvnKhyp6Da-NYroOW3ZGjoBKy3azqku80C789l0kh8CFOf_fg_fcBdg5WGu2PEoOcXRJZ14gZ3GKz5pSWfiZfyUPE659Grxueqy-rvwg/image-asset.png?format=1000w" width="1000"></center>

Google Earth Engine allows users to run algorithms on georeferenced imagery and vectors stored on Google's infrastructure. The Google Earth Engine API provides a library of functions which may be applied to data for display and analysis. Earth Engine's public data catalog contains a large amount of publicly available imagery and vector datasets. Private assets can also be created in users' personal folders.

GEE provides a data catalog that stores a large repository of geospatial data,
including optical imagery of a variety of satellites and air systems, environmental variables, weather,
and climate forecasts, land cover, socioeconomic and topographic datasets. Before being made available,
these data sets are preprocessed, enabling efficient access and removing many barriers associated with
data management.

GEE uses four object types to represent data that can be manipulated by its API. The Image
type represents raster data that can consist of one or more bands, which contain a name, data type,
scale, and projection. A stack or a time series of Images is represented by the ImageCollection type.
GEE represents vector data through the Feature type. This type is represented by a geometry (point,
line, or polygon) and a list of attributes. The FeatureCollection type represents groups of related
Features and provides functions to manipulate this data, such as sorting, filtering, and visualization. GEE only offers programming interfaces that support pixel-based processing. The result of a GEE processing can be viewed in the web IDE or saved in one of three company
services—Drive, Cloud Storage or Assets. GEE uses a Tiles server to make data available to the web
interface efficiently

GEE provides a JavaScript API and a Python API for data management and analysis. For the
JavaScript version, a web Integrated Development Environment [(IDE)](https://code.earthengine.google.com) is also provided, where the user has easy access to available data, applications and real-time
visualization of the processing results. The Python API is available through a module and has a
structure similar to its JavaScript version.

<center><img src="http://unescowe.sharif.ir/wp-content/uploads/2020/03/GEE1-1.png" width="500"></center>

#### The Code Editor 
The Code Editor is an interactive environment for developing Earth Engine applications. The center panel provides a JavaScript code editor. Above the editor are buttons to save the current script, run it, and clear the map. The Get Link button generates a unique URL for the script in the address bar. The map in the bottom panel contains the layers added by the script. At the top is a search box for datasets and places. The left panel contains code examples, your saved scripts, a searchable API reference and an asset manager for private data. The right panel has an inspector for querying the map, an output console, and a manager for long-running tasks. The help button help in the upper right contains links to this Guide and other resources for getting help. Learn more from the [Code Editor guide](https://developers.google.com/earth-engine/guides/playground).

The Earth Engine (EE) Code Editor at [code.earthengine.google.com](https://code.earthengine.google.com/) is a web-based IDE for the Earth Engine JavaScript API. Code Editor features are designed to make developing complex geospatial workflows fast and easy. The Code Editor has the following elements (illustrated in Figure 1):

- JavaScript code editor
- Map display for visualizing geospatial datasets
- API reference documentation (Docs tab)
- Git-based Script Manager (Scripts tab)
- Console output (Console tab)
- Task Manager (Tasks tab) to handle long-running queries
- Interactive map query (Inspector tab)
- Search of the data archive or saved scripts
- Geometry drawing tools

The steps below demonstrate how to open Earth Engine and execute a custom script that displays an image. For best results, you may want to install the latest version of Chrome, Google’s web browser, available here.

1. Open the Earth Engine Code Editor here: [code.earthengine.google.com](https://code.earthengine.google.com/). If you have not already, you will need to enable access by logging in using a registered Google account.
2. Navigate to the Scripts tab located on the far left of the Code Editor. There you will find a collection of example scripts that access, display, and analyze Earth Engine data.
3. Under “Image Collection,” select the “Filtered Composite” example. You will see a script appear in the center console. Press the Run button to execute the 4. script. The Filtered Composite example selects Landsat 7 images that intersect or are within the boundaries of Colorado and Utah. It then displays a true color composite of the selected images. The samples introduce you to commonly used methods, such as filter(), clip(), and Map.addLayer().

The Code Editor has a variety of features to help you take advantage of the Earth Engine API. View example scripts or save your own scripts on the Scripts tab. Query objects placed on the map with the Inspector tab. Display and chart numeric results using the Google Visualization API. Share a unique URL to your script with collaborators and friends with the Get Link button. Scripts you develop in the Code Editor are sent to Google for processing and the generated map tiles and/or messages are sent back for display in the Map and/or Console tab. All you need to run the Code Editor is a web browser (use Google Chrome for best results) and an internet connection. The following sections describe elements of the Earth Engine Code Editor in more detail.

<center><img src="https://developers.google.com/earth-engine/images/Code_editor_diagram.png"></center>

#### The Earth Engine Python API
The Earth Engine Python API is distributed as a conda-forge package at: https://anaconda.org/conda-forge/earthengine-api. It is installed with the conda install command. Before installing, however, make a conda environment specifically for Earth Engine. Installing the Earth Engine API to its own environment ensures that it and its dependent packages will not cause versioning issues with your base environment or any other environment you've previously set up and vice versa. For more information on managing conda environments, please visit this [site](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html).

1. Activate your base conda environment, if it is not already.
2. Make a conda virtual environment for the Earth Engine API.

```
conda create --name ee
```
3. Activate the conda ee environment.

```
conda activate ee
```
4. Install the API into the conda ee environment. Ensure that (ee) appears at the beginning of the command line, indicating you are working from the ee environment.

```
conda install -c conda-forge earthengine-api
```
Before using the Earth Engine API or earthengine command line tool, you must perform a one-time authentication that authorizes access to Earth Engine on behalf of your Google account. To authenticate, use the authenticate command from the earthengine command line tool.

Within your conda ee environment run the following command and follow the resulting printed instructions. A URL will be provided that generates an authorization code upon agreement. Copy the authorization code and enter it as command line input.
```
earthengine authenticate
```
Upon entering the authorization code, an authorization token gets saved to a credentials file which can be found below. Subsequent use of the API's ee.Initialize() command and the earthengine command line tool will look to this file to authenticate. If you want to revoke authorization, simply delete the credentials file.
```
ee.Initialize()
```
Any time you want to use ee:
```
import ee

# Initialize the Earth Engine module.
ee.Initialize()

# Print metadata for a DEM dataset.
print(ee.Image('USGS/SRTMGL1_003').getInfo())
```

#### Sentinel Hub
[Sentinel Hub](https://www.sentinel-hub.com/) (SH) is a platform developed by Sinergise that provides Sentinel data access and
visualization services. This is a private platform with public access (https://www.sentinel-hub.com).
Unlike Google Earth Engine, SH limits access to functionality in different payment plans. The free plan
only allows viewing, selection and downloading of raw data. Paid access enables data access through
OGC protocols and a specific API, data processing, mobile application data access, higher resource
access limits, and technical support.

<center><img src="https://www.sinergise.com/sites/default/files/styles/content_image/public/field/image/sent_thumbnail4.jpg?itok=ESzvtSJZ"></center>

[SH](https://www.sentinel-hub.com/explore/) provides two well-known remote sensing web applications ([EO Browser](https://apps.sentinel-hub.com/eo-browser/?zoom=10&lat=41.9&lng=12.5&themeId=DEFAULT-THEME) and [Sentinel Playground](https://apps.sentinel-hub.com/sentinel-playground/)). Sentinel Playground is a simple web viewer for rapid online viewing of the Sentinel-1, Sentinel-2, Landsat 8 and MODIS image archive; while EO Browser take advantage of powerful scripting functionality, explore 12 unique themes, create timelapses and download high resolution images, all for the complete archive of SH data.

<center><img src="http://www.gisandbeers.com/wp-content/uploads/2017/08/Sentinel-Playground.jpg"></center>

EO Browser makes it possible to browse and compare full resolution images from all the data sources we provide. You simply go to your area of interest, select your desired time range and cloud coverage, and inspect the resulting data in the browser. Try out different visualizations or make your own, download high resolution images and create timelapses. Satelite imagery in EO Browser can be visualized based on user's desired configuration. There are already several visualizations with legends and descriptions prepared for you, such as true color, false color, NDVI, EVI, etc. By choosing Custom it is possible to choose any combination of bands and make a composite by simply drag and dropping the bands into the RGB channels. The index tool makes it possible to quickly create remote sensing indices and control the visualization, by drag and dropping bands into the equation. The custom script functionality is a powerful tool for visualizing satellite data. Using Javascript, you have full control over your visualization, allowing you to make computations, logical operators and conditions, data fusion, multitemporal scripting, etc. Additionally, you can modify your image by editing the strength of the three color channels, contrast (gain) and luminance (gamma) in the effects pannel.

<center><img src="https://www.sentinel-hub.com/img/glacier.jpg" width="800"></center>

Para obtener una imagen se utiliza el siguiente script:

```
//color verdadero mejorando brillo (*) y contraste (-), y trasnparencia como cuarto valor (0.5)
return[(2.5*B04)-0.1,(2.5*B03)-0.1,(2.5*B02)-0.1,0.5]
```
Para calcular el NDVI y generar categorias, se puede utilizar:
```
//NDVI categorizado
var NDVI = (B08-B04)/(B08+B04)

if (NDVI<0.2){
	return[1,1,1]
}

if (NDVI<0.4){
	return[0,1,0]
}

if (NDVI<0.7){
	return[0,0.5,0]
}

else{
	return[0,0,0]
}
```

### Additional sources

- [The official NumPy documentation](https://docs.scipy.org/doc/numpy/reference/index.html)
- [The official Matplotlib Gallery](https://matplotlib.org/gallery/index.html)
- [The official Matplotlib Tutorials](https://matplotlib.org/tutorials/index.html)


- Rougier, N.P., 2016. [From Python to NumPy](http://www.labri.fr/perso/nrougier/from-python-to-numpy/).
- Oliphant, T.E., 2015. [A Guide to NumPy: 2nd Edition](https://www.amazon.com/Guide-NumPy-Travis-Oliphant-PhD/dp/151730007X). USA: Travis Oliphant, independent publishing.
- Varoquaux, G., Gouillart, E., Vahtras, O., Haenel, V., Rougier, N.P., Gommers, R., Pedregosa, F., Jędrzejewski-Szmek, Z., Virtanen, P., Combelles, C. and Pinte, D., 2015. [SciPy Lecture Notes](http://www.scipy-lectures.org/intro/numpy/index.html).