# Connecting to Data Sources


## Key Questions

- How do we store data?
- How do we structure the data we store?
- How do we access the data we store?

## Key Concepts and Definitions

- script
- command line interface
- concatenating data
- relational data
- join
- key
- SQL
- SELECT
- Database Server
- Datastore
- ODBC - Open Database Connectivity
- API
- Data Warehouse
- Web Scraping
- Parser

## Constructing Datasets
### Why do we need to combine data?
- different collection points
        - multiple survey agents
        - multiple weather stations
- related data
        - different schools with information about the same student
        - different medical providers with information about the same patient
- information collected at different "levels"
        - combining survey responses by country with information about country's population or other national indicators
        
### How do we combine data so that it has meaning: Data models.
> A data model (or datamodel[1][2][3][4][5]) is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner.

https://en.wikipedia.org/wiki/Data_model

#### A Simple Data Model
At the simplest level, this is just the attributes and characteristics (variables) we care about for every instance (observation) of a phenomenon. A simple data model of a student:

Each Student is comprised of
- First Name
- Last Name
- birth year
- last standardized test score
- response to question: Do you like school?
- response to question: Is America great?

#### Hierarchical Data Models
As we see from XML / JSON, Some data models can be hierachical
Book
- Title
- Genre
        - Title
        - Sales for Genre last year
        - Definition of Genre on Wikipedia
- Author
        - Name
        - Total number of books authored
        - Birth Year
        - Birth State


#### Flattening Hierarchies for Tabular Representation
These hierarchies can be *"flattened"* to fit into a table (where the data attributes are column headers)
Book
- Title
- Genre title
- Genre sales
- Genre def
- Author Name
- Author Book Count
- Author Birth Year
- Author Birth State

#### Relational Data Models
But you can also see how some information could be redundant. If multiple books were written by the same author or were in the same genre. You could have an alternative **"relational"** data model to address the redundancy

Book
- Title
- GenreID
- AuthorID

Author
- AuthorID
- Author Name
- Author Book Count
- Author Birth Year
- Author Birth State

Genre
- GenreID
- Genre Title
- Genre Sales
- Genre Def

Combining different elements from a data model to come up with flatter table structures is often referred to as **"joining"** particularly in the context of SQL databases. Joining data, relating them based on common keys is often a task of combining datasets.

## Full Technology Stack

- https://www.quora.com/What-is-a-technology-stack
- https://blog.hubstaff.com/technology-stack/

Computing is done in **layers**. Layers are **functionality** that support **abstractions**

### What is a basic example of an abstraction?
Computing at the most basic level is electrical state. Transistors function as switches using electricity. On or off. These on or off states can be "abstracted" or **conceptually mapped** to "0"s and "1"s allowing the electrical state to be interpreted and manipulated as binary numbering systems

### How does that build into the computer applications we know today?
Through a technology stack. 

1. Binary numbers -> grouped into several binary numbers, now can represent hexadecimal numbering systems
2. hexadecimal numbering systems -> mapped to instructions to turn on and off other switches and addresses
3. CPU / Integrated circuit organizes transistors to respect instruction set -> CPU with Machine code
4. Hexadecimal instructions are mapped to english words -> assembly language
5. sets of basic instruction into new concepts and translates them (i.e. turn these switches at this address on becomes store this number at this memory address) -> programming languages
6. Interface to access and run multiple programs and devices attached to CPU -> Operating System

"Technologies" that build on each other leveraging and combining more basic abstraction layers to do more complex tasks.

### Examples of Technology Stacks "Above" the Operating System
#### Programming on your computer
- map keyboard input to ASCII characters, storing ASCII characters to file -> **Text Editor**
- defining the rules for ASCII characters to be translated into instructions -> **Programming Language**
- Converting ASCII characters into machine code -> **Compiler / Interpreter**

#### Website
- Linux
- Apache
- MySQL
- PhP

#### DevLeague Documentation Stack
- Jupyter Notebooks
- Github Desktop
- Github.com
- Internet



## Basic Scripting and Repeatable Analysis
What is a scripting programming language?
>Typically scripting languages are intended to be very fast to learn and write in, either as short source code files or interactively in a read–eval–print loop (REPL, language shell).[3] This generally implies relatively simple syntax and semantics; typically a "script" (code written in the scripting language) is executed from start to finish, as a "script", with no explicit entry point.

For more: https://en.wikipedia.org/wiki/Scripting_language

### What are scripting languages we will encounter
- R
- Python
- Shell Scripts

### What are examples of things we need to repeat?
- Data loading operations
        - Specifying files
        - configuration options in the loads, such as delimiters
- Data cleaning operations
        - specifying rows or columns to exclude
        - replacing or transforming data values
- Data combination / compilation
        - specifying the terms of combination, what information to keep, what to exclude
        - specifying which data to combine
- Specifying the source / target structure of key information
        - how we assign meaning to the data file
        - how we intend to store it
- Analytical Operations
        - Sorting
        - Filtering
        - Calculating Statistics
- Display Operations
        - outputting formatted text display
        - writing output to a file
        - configuring and displaying a chart
- Loading Libraries and External Resources
        - specifying libraries and versions

## Connections to Corporate Databases, APIs, Datawarehouses

### How do data collecting entities share data?
We know the common *formats* that entities use to share data. XML, JSON, CSV. However, what *technology* and techniques are used to share data?
- Hosting data files on an internal network
- Publishing data files onto the web
- Publishing data on web pages
- Publishing report and exports (survey monkey export, ADP report)
- Providing API services (data.hawaii.gov)
- SQL / database services

#### What is a Network Service?
>In computer networking, a network service is an application running at the network application layer and above, that provides data storage, manipulation, presentation, communication or other capability which is often implemented using a client-server or peer-to-peer architecture based on application layer network protocols.

>Each service is usually provided by a server component running on one or more computers (often a dedicated server computer offering multiple services) and accessed via a network by client components running on other devices. However, the client and server components can both be run on the same machine.

https://en.wikipedia.org/wiki/Network_service

Are we using any examples of Network Services?

#### What is an API?
An Illustrated Example: https://www.quora.com/What-is-an-API-4/answer/Katy-Levinson
More perspectives
https://www.quora.com/What-is-an-API-4
https://medium.freecodecamp.org/what-is-an-api-in-english-please-b880a3214a82


#### What is a Database Service?
Access to a database server over a network.

#### What is a Data Warehouse

> In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place[2] that are used for creating analytical reports for knowledge workers throughout the enterprise.[3]

https://en.wikipedia.org/wiki/Data_warehouse

## Read / Write / Edit and Convert Data Files




## Scraping


## Joining on a common Key

### Examples of What to Google
- "Shell Scripting Basics"
- "Python Scripting Basics"
- "R Scripting Basics"
- "Beginner Scripting"
- "Scripting for Data Analysis"
- "Python for marketing analysts"
- "Python for statisticians"
- http://bigdata-madesimple.com/step-by-step-approach-to-perform-data-analysis-using-python/
- https://www.quora.com/Whats-the-difference-between-a-programming-language-and-a-scripting-language
- https://www.quora.com/topic/Scripting-programming

### Examples of What to Document
- What are the recommendations on how to proceed?
- What are the varying perspectives on programming languages, learning to program
- What libraries are people recommending
- What learning methods are people recommending?

## Project Ideas

- Creating a SQLlite database from a spreadsheet
- Connecting to a SQLlite database from R
- Connecting to a SQLlite database from python
- Creating and populating a remote database using MySQL
- Querying from a CSV using command line tools
- Querying from SQLlite vs a CSV from command line tools
- Constructing a complex - 3-4 table - data model in SQL
- Experimenting with joins
- Connecting to websites and creating a data table from web data

## Some example projects
- http://erikrood.com/Python_References/web_scrape.html
- https://www.dataquest.io/blog/python-api-tutorial/
- https://tclavelle.github.io/blog/r_and_apis/
- http://www.gregreda.com/2013/03/03/web-scraping-101-with-python/ 
- http://www.gregreda.com/2014/07/27/scraping-craigslist-for-tickets/


## A Few Example tutorials to Walk Through
- https://community.modeanalytics.com/sql/tutorial/introduction-to-sql/
- https://sqlite.org/quickstart.html
- https://sqlite.org/cli.html
- http://www.sqlitetutorial.net/sqlite-import-csv/
- https://docs.python.org/2/library/sqlite3.html


