## Advanced Lesson on Spatial Modeling and Analytics
### Segment 1 of 5
# Introduction to R

## Thank you for helping our study


<a href="#/slide-1-0" class="navigate-right" style="background-color:blue;color:white;padding:8px;margin:2px;font-weight:bold;">Continue with the lesson</a>

Throughout this lesson you will see reminders, like the one below, to ensure that all participants understand that they are in a voluntary research study.

### Reminder

<font size="+1">

By continuing with this lesson you are granting your permission to take part in this research study for the Hour of Cyberinfrastructure: Developing Cyber Literacy for GIScience project. In this study, you will be learning about cyberinfrastructure and related concepts using a web-based platform that will take approximately one hour per lesson. Participation in this study is voluntary.

Participants in this research must be 18 years or older. If you are under the age of 18 then please exit this webpage or navigate to another website such as the Hour of Code at https://hourofcode.com, which is designed for K-12 students.

If you are not interested in participating please exit the browser or navigate to this website: http://www.umn.edu. Your participation is voluntary and you are free to stop the lesson at any time.

For the full description please navigate to this website: <a href="../../gateway-lesson/gateway/gateway-1.ipynb">Gateway Lesson Research Study Permission</a>.

</font>

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

import warnings
warnings.filterwarnings('ignore') # Hide warnings

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
# HTML(''' 
#     <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
#     <input id="toggle_code" type="button" value="Toggle raw code">
# ''')

HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')


## Comparing R and Python

<table>
    <tr style="background: #fff; text-align: left; vertical-align:">
        <td style="background: #fff; text-align: left; font-size: 23px;">
            R serves vast amounts of discipline- specific data science packages
<li>
Biostatistics
    </li>
    <li>
Geostatistics
    </li>
    <li>
Econometrics
    </li>
          
Python has general purpose data-science libraries
<li>
Deep Learning (Tensorflow, …)
    </li>
    <li>
Machine Learning (scikit-learn, …)
    </li>
    <li>
Used for analysis & build scalable software
    </li>         
</td>
     <td style="width: 50%; background: #fff; text-align: left; vertical-align: top;"> <img src='supplementary/r_python.png' width="500" height="700" alt='map'>
        </td>
    </tr>
</table>










## Strengths of the R Language

<ul>
    <li>
Breadth of available packages
    </li>
    <ul>
        <li>
18595 packages as of September 2022
        </li>
    </ul>
    <li>
Discipline-specific data science functionality
    </li>
    <li>
Gentle learning curve
    </li>
    <li>
Analysis ecosystem built for 
    </li>
    <ul>
        <li>
Statistical analysis
        </li>
        <li>
Open-science
        </li>
    </ul>
</ul>
    



## Weaknesses of R Language

<ul>
    <li>
        <b>Speed</b>
    </li>
    <ul>
        <li>
R is considerably slower than Python
        </li>
        <li>
Loops are notoriously slow
        </li>
     </ul>
    <li>
        <b>Memory</b>
    </li>
    <ul>
        <li>
R is a single threaded programming language
            <ul>
                <li>
It utilizes a single CPU at a time
                </li>
                <li>
Packages for multithreading exists
                </li>
                <ul>
                    <li>
Not a part of base R
                    </li>
                </ul>
            </ul>
         <li>
Memory bottlenecks occurs very frequently with medium size (1, 2 GB) data
        </li>
        <li>
Inefficient R code is not as forgiving as Python
                </li>
                <ul>
                    <li>
Avoid loops as much as possible!
                    </li>
                </ul>
            </ul>




## CRAN [(The Comprehensive R Archive Network)](https://cran.r-project.org/)

<ul>
    <li>
Network of servers that serve R 
    </li>
    <ul>
        <li>
Executables, source code and documentation
        </li>
    </ul>
    <li>
Body that asserts policy and quality control over R packages
    </li>
    <ul>
        <li>
Ensuring new packages are open-source
        </li>
        <li>
Upholding documentation quality
        </li>
        <li>
Making sure every R package in CRAN works!
        </li>
    </ul>
    <li>
One-stop-shop for downloading R
    </li>
</ul>




## The R Ecosystem


<center><img src='supplementary/r_eco.png' width="500" height="700" alt='map'></center>


## The R Ecosystem: CRAN

<ul>
    <li>
CRAN hosts the vast majority of R packages and their documentation
    </li>
    <li>
All CRAN packages can be installed simply by:
    </li>
    <ul>
        <li>
Install.packages(“package_name”)
        </li>
    </ul>
    <li>
Serves an exhaustive <a href= https://cran.r-project.org/web/packages/available_packages_by_name.html>list of all supported packages</a>
    </li>
</ul>



## The R Ecosystem: Bioconductor
<ul>
    <li>
Serves packages related to bioinformatics
    </li>
    <ul>
        <li>
Extensive package list for fields such as genomics
        </li>
    </ul>
    <li>
More specific in its scope compared to CRAN
    </li>
    <li>
As of September 2022, serves 2140 packages
    </li>
    <li>
Requires an R package, BiocManager, to install packages
    </li>
    <ul>
        <li>
BiocManager::install(“package_name”)
        </li>
    </ul>
</ul>
    

## The R Ecosystem: Bioconductor

<ul>
    <li>
Code not contributed to CRAN or Bioconductor
    </li>
    <li>
It is not peer-reviewed and may not have documentation
    </li>
    <li>
Used frequently for personal or ongoing projects
    </li>
    <li>
An R package, devtools, can be used R packages on GitHub directly
    </li>
    <ul>
        <li>
devtools::install_github(“owner_name/repo_name")
        </li>
    </ul>
</ul>



## R Vignettes

<ul>
    <li>
Long-form description of a package
    </li>
    <li>
It is structured as an academic paper
    </li>
    <ul>
        <li>
Introduces method(s) implemented
        </li>
        <li>
Showcases use on sample problems
        </li>
        <li>
Introduces use of function parameters
        </li>
    </ul>
    <li>
<a href = https://cran.r-project.org/web/packages/spatstat/vignettes/getstart.pdf>Example vignette:</a>
    </li>
    <ul>
        <li>
A vignette for spatstat, a commonly used spatial statistics package
        </li>
    </ul>
</ul>








## Task Views

<ul>
    <li>
List of R packages organized within a theme
    </li>
    <li>
Task views organize R package in terms of common analysis type within a theme
    </li>
    <li>
        <a href = https://cran.r-project.org/web/views/Spatial.html>Analysis of Spatial Data task view</a>
    </li>
    <ul>
        <li>
Contains a wide array of R packages
        </li>
        <li>
Groups packages with respect to their use in different stages of spatial analysis
        </li>
    </ul>
</ul>



## R Documentation

<ul>
    <li>
It is a PDF and a live document available for every CRAN package
    </li>
    <li>
PDF version can be found in the CRAN page of a package
    </li>
     <ul>
        <li>
            See <a href=https://cran.r-project.org/web/packages/spatstat/index.html>spatstat example</a>
        </li>
    </ul>
    <li>
Targeted help about a function can be obtained via the R code:
    </li>
    <ul>
        <li>
? function_name
        </li>
    </ul>
    <li>
Keyword search for a phrase or concept (use quotes for multiple words)
    </li>
    <ul>
        <li>
??"geographically weighted regression"
        </li>
    </ul>
</ul>




<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="sma-3.ipynb">Click here to go to the next notebook.</a></font>