# Analyze Data to Answer Questions

Notes from this course: https://www.coursera.org/learn/analyze-data/

## Module 1: Organizing data to begin analysis

### Learning log

#### Let's get organized
- Analysis
    - The process used to make sense of the data collected
    - The goal of analysis is to identify trends and relationships within data so you can accurately answer the question you're asking
- The 4 phases of analysis
    - Organize data
        - Most of the datasets you will use will be organized as tables
        - Tables are helpful because they let you manipulate your data and categorize it
        - Having distinct categories and classifications lets you focus on, and differentiate between, your data quickly and easily
    - Format and adjust data
        - Sorting and filtering are two ways you can keep things organized when you format and adjust data to work with it
        - A filter can help you find errors or outliers so you can fix or flag them before your analysis
        - An analyst sorts and filters data during the format and adjust analysis phase
        - The benefit of filtering the data is that after you fix errors or identify outliers, you can remove the filter and return the data to its original organization
        - Example
            - You are working with a dataset from a local community college. You sort the students alphabetically by last name
    - Get input from others
        - Phases of analysis where you compare your data to external sources
        - Example
            - You ask volunteers at a theater production which tasks they have already completed and add that data to a spreadsheet containing all required tasks. You will use the information provided by the volunteers to figure out which tasks still need to be done.
    - Transform data
        - By observing relationships between data points and making calculations
        - Phase of analysis that attempts to determine if there are any patterns in the data
        - Example
            - You are working with three datasets about voter turnout in your county. First, you identify relationships and patterns between the datasets. Then, you use formulas and functions to make calculations based on your data
- People refer to Google data as a lens to human curiosity
- The bottom line is that it's important to have your data in the right format. So always be prepared to adjust, no matter how far into your analysis you are.
- Outlier
    - Data points that are very different from similarly collected data and might not be reliable values
- Sorting
    - Sorting is when you arrange data into a meaningful order to make it easier to understand, analyze, and visualize
    - It ranks your data based on a specific metric you choose
    - Sorting will arrange the data in a meaningful way and give you immediate insights
    - Sorting also helps you to group similar data together by a classification
- Filtering
    - Filtering is used when you are only interested in seeing data that meets a specific criteria, and hiding the rest
    - Filtering is really useful when you have lots of data
    - You can save time by zeroing in on the data that is really important or the data that has bugs or errors
    - Filtering gives you the ability to find what you are looking for without too much effort
    - Use filtering when you need to reduce the amount of data that is displayed
    - It is important to point out that, after you filter data, you can sort the filtered data, tooz
- Database organization
    - Used to decide which data is relevant to their analysis and which data types and variables are appropriate
    - Enables analysts to make decisions about which data is relevant to pull for a specific analysis. It also helps them decide which data types and variables are appropriate
- Sort sheet
    - All of the data in a spreadsheet is sorted by the ranking of a specific sorted column
    - Data across rows is kept together
- Sort range
    - Doesn't keep the information across rows together
    - Nothing else on the spreadsheet is rearranged besides the specified cells in a column
- Customized sort order
    - When you sort data in a spreadsheet using multiple conditions

#### Glossary
https://docs.google.com/document/d/1b70u-s0d9YdlUY2xORogZTlLD6BZSnSB2r99lI4IufY/template/preview

---

## Module 2: Formatting and adjusting data

### Learning log

#### Convert and format data
- Incorrectly formatted data can:
    - Lead to mistakes
    - Take time to fix
    - Affect stakeholder's decision-making
- One of the ways to help ensure that you have an accurate analysis of your data is by putting all of it in the correct format. This is true even if you have already cleaned and processed your data. As a part of getting your data ready for analysis, you will need to convert and format your data early on in the process.
- Data validation in spreadsheets
    - Allows you to control what can and can't be entered in your worksheet
    - Add dropdown lists with predetermined options
    - Create custom checkboxes
    - Protect structured data and formulas
- Conditional formatting
    - A spreadsheet tool that changes how cells appear when values meet specific conditions
- SQL
    - COERCION
        - Work with big numbers
    - UNIX_DATE
        - Returns the number of days that have passed since January 1, 1970 and is used to compare and work with dates across multiple time zones
    - SAFE_CAST
        - Using the CAST function in a query that fails returns an error in BigQuery. To avoid errors in the event of a failed query, use the SAFE_CAST function instead
        - The SAFE_CAST function returns a value of Null instead of an error when a query fails
- Openness (or open data)
    - Free access, usage, and sharing of data

#### Get support during analysis
- `IF(end>start, end-start, 24+end-start)`
    - Calculate elapsed time that started and ended on different days
- The analyze stage is where you become the expert about your dataset
- Best practices for searching oneline
    - Thinking skills
    - Data analytics terms
    - Basic knowledge of tools
- Mental model
    - Your thought process and the way you approach a problem
- R
    - A programming language frequently used for statistical analysis, visualization, and other data analysis

#### Glossary
https://docs.google.com/document/d/1kpj3hm2NDlgI624cD7R13P1UaJ3V7g4t3DtdRhvKV7g/template/preview

##### Further reading
- [CONVERT](https://support.google.com/docs/answer/6055540?hl=en)
- [TO_PERCENT](https://support.google.com/docs/answer/3094284?hl=en)
- [Change date format](https://www.ablebits.com/office-addins-blog/2019/08/13/google-sheets-change-date-format/)
- [How to convert text to numbers](https://productivityspot.com/convert-text-to-numbers-google-sheets/)
- [How to split and combine cells](https://www.techrepublic.com/article/how-to-split-or-combine-text-cells-with-google-sheets/)
- [Conversion Rules in Standard SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_rules)
- [CAST and CONVERT](https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-ver15)
- [MySQL CAST Functions and Operators](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html)
- [Keyboard shortcuts for Google Sheets](https://support.google.com/docs/answer/181110)
- [List of Google Sheets Functions](https://support.google.com/docs/table/25273?hl=en)
- [23 Must-Know Google Sheet Formulas](https://blog.golayer.io/google-sheets/google-sheets-formulas)
- [18 Google Sheets Formula Tips and Techniques](https://www.benlcollins.com/spreadsheets/google-sheets-formulas-techniques/)


---

## Module 3: Aggregating data for analysis

### Learning log

#### VLOOKUP and data aggregation

#### Use JOINS to aggregate data in SQL

#### Work with subqueries

#### Glossary

##### Further reading

---

## Module 4: Performing data calculations