Advancing into Analytics: Let's Practice!

This repo contains extra practice sets based on Advancing into Analytics: From Excel to Python and R.

Skip to sections:

Part I: Foundations of Analytics in Excel
Part II: From Excel to R
Part III: From Excel to Python

Solutions are available for students of Advancing into Analytics courses.

I can also provide to high school or college teachers using the book in their course. Learn more about using Advancing into Analytics in the classroom here.

All datasets referred to can be found in the datasets subfolder. There is also a file with links to their sources where you can get column definitions and so forth.

Part I: Foundations of Analytics in Excel

1. Foundations of Exploratory Data Analysis

Using the titanic dataset and Excel:

Classify each variable’s type.
Build and visualize a two-way frequency of Pclass by Survived.
Return descriptive statistics for Fare. How does the distribution differ by Pclass?
Visualize the distribution of Age.

2. Foundations of Probability

Simulate 100 coin tosses and visualize the result as a histogram in Excel. Do some online research -- what probability distribution does a series of coin tosses follow?
Now take the average of twenty coin tosses and visualize the distribution of 100 such averages as a histogram. What probability distribution do the sample means follow?
Plot two probability distributions in Excel on the same area graph: one with a mean of 25 and standard deviation of 8, the other with 40 and 2, respectively.

3. Foundations of Inferential Statistics

Using the geyser dataset and Excel, test the relationship between the kind of eruption (long or short) and the waiting time until the next eruption:

What are your statistical hypotheses?
Are your results statistically significant? What evidence does this lend to your hypotheses?
What is the estimated effect size?

Answer the same questions, but for the relationship between having a CD-ROM and the price of the computer in the computers datset.

4. Correlation and Regression

Using the car-prices dataset and Excel:

What is the correlation between mileage and price?
Which is the independent and dependent variable?
Find the coefficents and their statistical significance along with the R-square of the regression model.

Using the birthweights dataset and Excel:

Derive a correlation for all relevant variables.
Between length and birthweight, which is the independent versus dependent variable?
Find the coefficents and their statistical significance along with the R-square of the regression model.

5. The Data Analytics Stack

This chapter is a conceptual overview of how Excel, R and Python fit into the broader data analytics stack. If you'd like to get started learning other elements of analytics, I suggest the following resources:

Codeacademy's free Introduction to SQL course
Power BI on Microsoft Learn
If you're on Mac and can't run Power BI, start with Tableau Public instead

Part II: From Excel to R

6. First Steps with R for Excel Users

Run the following code. What happens?

temps <- c(66, 71, 68, 74, 70)
days <- c('Mon', 'Tue', 'Wed', 'Thr', 'Fri')

names(temps) <- days

temps['Wed']

Knowing what you do now, return the values for Monday and Friday first by index position, then by name.

Explore the R Help page in RStudio: from the menu click Help > R Help.
Create an RStudio Community account. This is an excellent forum for getting help in R and RStudio moderated by the pros.
If you would like additional reading on getting started in R, check out the following:

7. Data Structures in R

Easter egg time! In this chapter we used the write_xlsx() function from package writexl. Pull its help documentation... an anthropomorphized office supply awaits!
Many packages include vignettes, similar to white papers showing important features and concepts of the package. These can be viewed via the browseVignettes() function.

Open the vignettes for dpyr and tidyverse. The vignettes will open in a separate internet tab. Use the function's help documentation to confirm what to do here.

How many datasets does the babynames package come with? Choose one, run its descriptive statistics and write the dataset to an Excel worksheet.

8. Data Manipulation and Visualization in R

Using the car-crashes dataset:

The total variable represents the number of drivers involved in fatal collisions per billion miles. Derive a column that represents the number of such drivers per million miles.
Find the average insurance premium for each Census region.
Visualize the distribution of no_previous. Which distribution does it appear to follow?
Visualize the relationship between ins_premium and ins_losses. Re-label your axes and add a chart title. Does the relationship appear to be linear?
Create a data frame in R that would look like this in Excel:

9. Capstone: R for Data Analytics

Using the birthweights dataset:

Is there a significant difference in the gestation period of babies whose mother is and is not a smoker?
Produce a correlation matrix of the relevant variables in this dataset.
Visualize the relationship of weight by length.
Regress weight on height. Find the equation of the fit regression line. Is there a significant relationship? What percentage of the variance in weight is explained by height?
Split your regression model into training and testing subsets. What is the R-squared and RMSE on your test model?

Extra credit: Conduct your work in an R Notebook (included with RStudio) or Jupyter Notebook (see instructions here for using R with Jupyter).

Part III: From Excel to Python

10. First Steps with Python for Excel Users

Continue getting to know the Jupyter interface. In particular, under the Help menu you will find an interactive "User Interface Tour" to walk you through the interface, as well as a guide to Markdown, which you may need for the next exercise...
Practice your Markdown skills by including the following elements in a notebook:

Two different levels of headers
Italics
A hyperlink
An image

Google Colab is a free cloud-based environment based on the Jupyter Notebook. This is a great tool for sharing your data projects on the web. Check out this introductory Colab notebook to get started.

11. Data Structures in Python

Easter egg time! Run the following code: import this. What happens?
Install the vega_datasets from pip, then run the following code to import a movies dataset:

import pandas as pd
from vega_datasets import data
movies = data.movies()
movies.head()

Generate descriptive statistics for the dataset.
Write the first four columns of the dataset to an Excel file.

12. Data Manipulation and Visualization in Python

Using the car-crashes dataset:

The total variable represents the number of drivers involved in fatal collisions per billion miles. Derive a column that represents the number of such drivers per million miles.
Find the average insurance premium per Census region.
Visualize the distribution of no_previous. Which distribution does it appear to follow?
Visualize the relationship between ins_premium and ins_losses. Re-label your axes and add a chart title.
Create a data frame in Python that would look like this in Excel:

13. Capstone: Python for Data Analytics

Using the birthweights dataset:

Is there a significant difference in the gestation period of babies whose mother is and is not a smoker?
Produce a correlation matrix of the relevant variables in this dataset.
Visualize the relationship of weight by length.
Regress weight on height. Find the equation of the fit regression line. Is there a significant relationship? What percentage of the variance in weight is explained by height?
Split your regression model into training and testing subsets. What is the R-squared and RMSE on your test model?

14. Conclusion and Next Steps

The conclusion offers a few next steps for continuing on in your analytics journey. Here are a couple of quick wins on that front:

If you've not already downloaded GitHub Desktop, do it now. If you have, continue to practice. Either way, the official Desktop documentation is a great place to learn more.
Read "How to Design Smart Business Experiments" by Thomas H. Davenport (Harvard Business Review)

Thanks for "advancing into analytics" with me .... roll credits!

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
datasets		datasets
images		images
resources		resources
.gitignore		.gitignore
README.md		README.md
about-the-datasets.md		about-the-datasets.md
advancing-into-analytics-practice.Rproj		advancing-into-analytics-practice.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

images

images

resources

resources

.gitignore

.gitignore

README.md

README.md

about-the-datasets.md

about-the-datasets.md

advancing-into-analytics-practice.Rproj

advancing-into-analytics-practice.Rproj

Repository files navigation

Advancing into Analytics: Let's Practice!

Solutions are available for students of Advancing into Analytics courses.

Part I: Foundations of Analytics in Excel

1. Foundations of Exploratory Data Analysis

2. Foundations of Probability

3. Foundations of Inferential Statistics

4. Correlation and Regression

5. The Data Analytics Stack

Part II: From Excel to R

6. First Steps with R for Excel Users

7. Data Structures in R

8. Data Manipulation and Visualization in R

9. Capstone: R for Data Analytics

Part III: From Excel to Python

10. First Steps with Python for Excel Users

11. Data Structures in Python

12. Data Manipulation and Visualization in Python

13. Capstone: Python for Data Analytics

14. Conclusion and Next Steps

About

Releases

Packages

stringfestdata/advancing-into-analytics-practice

Folders and files

Latest commit

History

Repository files navigation

Advancing into Analytics: Let's Practice!

Solutions are available for students of Advancing into Analytics courses.

Part I: Foundations of Analytics in Excel

1. Foundations of Exploratory Data Analysis

2. Foundations of Probability

3. Foundations of Inferential Statistics

4. Correlation and Regression

5. The Data Analytics Stack

Part II: From Excel to R

6. First Steps with R for Excel Users

7. Data Structures in R

8. Data Manipulation and Visualization in R

9. Capstone: R for Data Analytics

Part III: From Excel to Python

10. First Steps with Python for Excel Users

11. Data Structures in Python

12. Data Manipulation and Visualization in Python

13. Capstone: Python for Data Analytics

14. Conclusion and Next Steps

About

Resources

Stars

Watchers

Forks