# **Waze Navigation to Retention: Churn Prediction & Analysis**
**Google Advanced Data Analytics - Go Beyond the Numbers: Translate Data into Insights**

# **Course 3 End-of-course project: Exploratory data analysis**

In this activity, you will examine data provided and prepare it for analysis.
<br/>

**The purpose** of this project is to conduct exploratory data analysis (EDA) on a provided dataset.

**The goal** is to continue the examination of the data that you began in the previous Course, adding relevant visualizations that help communicate the story that the data tells.
<br/>


*This activity has 4 parts:*

**Part 1:** Imports, links, and loading

**Part 2:** Data Exploration
*   Data cleaning


**Part 3:** Building visualizations

**Part 4:** Evaluating and sharing results

<br/>

Follow the instructions and answer the question below to complete the activity. Then, you will complete an executive summary using the questions listed on the [PACE Strategy Document ](https://docs.google.com/document/d/1iSHdbfQR6w8RClJNWai8oJXn9tQmYoTKn6QohuaK4-s/template/preview?resourcekey=0-ZIHnbxL1dd2u9A47iEVXvg).

Be sure to complete this activity before moving on. The next course item will provide you with a completed exemplar to compare to your own work.

# **Visualize a story in Python**

# **PACE stages**


## **PACE: Plan**

Consider the questions in your PACE Strategy Document to reflect on the Plan stage.



### **Task 1. Imports and data loading**

For EDA of the data, import the data and packages that will be most helpful, such as pandas, numpy, matplotlib, and seaborn.

- `pandas`: For data manipulation and analysis.
- `numpy`: For numerical operations.
- `matplotlib`: For creating static, interactive, and animated visualizations.
- `seaborn`: For making statistical graphics (built on top of Matplotlib and integrates well with Pandas).

After importing the packages, we'll load the dataset to get an initial understanding of the data.

In [51]:
# Importing essential libraries for EDA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Setting up the style for plots
sns.set_style('darkgrid')

# Loading the Waze dataset
file_path = 'waze_dataset.csv'
waze_data = pd.read_csv(file_path)

# Displaying the first few rows of the dataset to understand its structure
waze_data.head()

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android
1,1,retained,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,iPhone
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone
4,4,retained,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,Android


We have successfully loaded the dataset and can see the first few rows. Here's a brief description of some of the columns we have:

- `ID`: Unique identifier for the user.
- `label`: Retention label indicating whether the user is retained or has churned.
- `sessions`: Number of sessions.
- `drives`: Number of drives.
- `total_sessions`: Total number of sessions, including current and previous periods.
- `n_days_after_onboarding`: Number of days after onboarding.
- `total_navigations_fav1` and `total_navigations_fav2`: Total navigations to favorite destinations 1 and 2.
- `driven_km_drives`: Total kilometers driven.
- `duration_minutes_drives`: Total duration of drives in minutes.
- `activity_days`: Number of active days.
- `driving_days`: Number of driving days.
- `device`: Device used, e.g., Android or iPhone.

#### **1. Total Entries in the Dataset**
The dataset contains 14,999 rows, but can we assume that each record represents unique users?



In [52]:
# Call in the info method on the Waze DataFrame
waze_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14999 entries, 0 to 14998
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   ID                       14999 non-null  int64  
 1   label                    14299 non-null  object 
 2   sessions                 14999 non-null  int64  
 3   drives                   14999 non-null  int64  
 4   total_sessions           14999 non-null  float64
 5   n_days_after_onboarding  14999 non-null  int64  
 6   total_navigations_fav1   14999 non-null  int64  
 7   total_navigations_fav2   14999 non-null  int64  
 8   driven_km_drives         14999 non-null  float64
 9   duration_minutes_drives  14999 non-null  float64
 10  activity_days            14999 non-null  int64  
 11  driving_days             14999 non-null  int64  
 12  device                   14999 non-null  object 
dtypes: float64(3), int64(8), object(2)
memory usage: 1.5+ MB


In [53]:
# Checking for duplicate values in the 'ID' column
unique_users_count = waze_data['ID'].nunique()
total_rows = len(waze_data)
duplicate_users_count = total_rows - unique_users_count

unique_users_count, duplicate_users_count

(14999, 0)

- There are *14,999* unique user IDs in the dataset.
- THere are 0 duplicate user IDs.

This means, we can conclude that each row in the dataset does represent a unique user and our initial observation that the dataset contains information on 14,999 users is accurate. It's always good practice to validate assumptions like this, especially when working with real-world data where unexpected anomalies can occur.

#### **1. Columns and Thier Data Type**
We have a variety of data types in our dataset. Let's list each data type and their corresponding columns:

##### Integer Columns
These columns contain discrete numerical values.

In [54]:
# Identifying integer columns
int_columns = waze_data.select_dtypes(include='int64').columns

# Displaying unique values for integer columns (if not too many) or their range
int_summary = {}
for col in int_columns:
    unique_vals = waze_data[col].unique()
    if len(unique_vals) <= 10:  # Arbitrary cutoff for displaying unique values
        int_summary[col] = unique_vals
    else:
        int_summary[col] = (waze_data[col].min(), waze_data[col].max())

int_summary

{'ID': (0, 14998),
 'sessions': (0, 743),
 'drives': (0, 596),
 'n_days_after_onboarding': (4, 3500),
 'total_navigations_fav1': (0, 1236),
 'total_navigations_fav2': (0, 415),
 'activity_days': (0, 31),
 'driving_days': (0, 30)}

- ID: Ranges from 0 to 14,998, which likely represents a unique identifier for each user.
- sessions: Ranges from 0 to 743, indicating the number of sessions a user has had.
- drives: Ranges from 0 to 596, representing the number of drives a user has taken.
- n_days_after_onboarding: Ranges from 4 to 3,500, which shows the number of days since a user was onboarded.
- total_navigations_fav1: Ranges from 0 to 1,236, indicating the total navigations to a user's first favorite destination.
- total_navigations_fav2: Ranges from 0 to 415, representing the total navigations to a user's second favorite destination.
- activity_days: Ranges from 0 to 31, which shows the number of active days a user had.
- driving_days: Ranges from 0 to 30, indicating the number of days a user was driving.

##### Float Columns
These columns contain continuous numerical values. We'll display the min, max, mean, and std for these columns.

In [55]:
# Identifying float columns
float_columns = waze_data.select_dtypes(include='float64').columns

# Displaying statistics for float columns
float_summary = waze_data[float_columns].describe().loc[['min', 'max', 'mean', 'std']]
float_summary.transpose()


Unnamed: 0,min,max,mean,std
total_sessions,0.220211,1216.154633,189.964447,136.405128
driven_km_drives,60.44125,21183.40189,4039.340921,2502.149334
duration_minutes_drives,18.282082,15851.72716,1860.976012,1446.702288


total_sessions:
- Minimum: ~0.22
- Maximum: ~1216.15
- Mean: ~189.96
- Standard Deviation: ~136.41
- This column represents the total number of sessions, including current and previous periods.

driven_km_drives:
- Minimum: ~60.44 km
- Maximum: ~21,183.40 km
- Mean: ~4,039.34 km
- Standard Deviation: ~2,502.15 km
- This column indicates the total kilometers driven by the user.

duration_minutes_drives:
- Minimum: ~18.28 minutes
- Maximum: ~15,851.73 minutes
- Mean: ~1,860.98 minutes
- Standard Deviation: ~1,446.70 minutes
- This column represents the total duration of drives in minutes.

##### Object AKA String Columns
For object columns, we'll list the unique values to understand their nature. If a column has too many unique values, we'll display the count of unique values instead.

In [97]:
# Identifying object columns
object_columns = waze_data.select_dtypes(include='object').columns

# Displaying unique values for object columns (if not too many) or their count
object_summary = {}
for col in object_columns:
    unique_vals = waze_data[col].unique()
    if len(unique_vals) <= 10:  # Arbitrary cutoff for displaying unique values
        object_summary[col] = unique_vals
    else:
        object_summary[col] = len(unique_vals)

object_summary

{'label': array(['retained', 'churned', nan], dtype=object),
 'device': array(['Android', 'iPhone'], dtype=object)}

For the object (string) columns, we have:

label:
- Unique values: 'retained', 'churned', and NaN (indicating missing values).
- This column indicates whether a user is retained or has churned.

device:
- Unique values: 'Android' and 'iPhone'.
- This column represents the type of device the user has, either an Android phone or an iPhone.

#### **1. Missing Values**
**We noticed that the label column brought up values for NaN which means there were missing values!**


In [98]:
# Calculating the number of missing values in the 'label' column
missing_values_label = waze_data['label'].isnull().sum()
missing_values_label

700

**The label column has 700 missing values. Since this column is essential for our analysis, we'll need to decide on a strategy for handling these missing values during our data cleaning phase. Options might include removing these records, imputation, or categorizing them differently based on available data.**

To summarize:

The dataset contains 14,999 entries.
We have various data types in our dataset, including integers, floats, and objects (strings).
Integer columns like sessions and drives capture discrete counts.
Float columns like total_sessions and driven_km_drives capture continuous measurements.
Object columns like label and device capture categorical information.
The label column, which indicates user retention status, has 700 missing values.

## **PACE: Analyze**

- There are a total of 14,999 entries in the dataset.
- We have 13 columns, consisting of 8 integer types, 3 float types, and 2 object (string) types.
- The label column has some missing values (700 missing), as it has only 14,299 non-null entries out of 14,999.
- All other columns appear to be complete with no missing values.
Given that the label column is crucial for our analysis (as it indicates whether a user is retained or has churned), we'll need to decide how to handle these missing values during the data cleaning process.


Consider the questions in your PACE Strategy Document and those below where applicable to complete your code:
1. Does the data need to be restructured or converted into usable formats?

2. Are there any variables that have missing data?


==> ENTER YOUR RESPONSES TO QUESTIONS 1-2 HERE

### **Task 2. Data exploration and cleaning**

Consider the following questions:



1.  Given the scenario, which data columns are most applicable?

2.  Which data columns can you eliminate, knowing they won’t solve your problem scenario?

3.  How would you check for missing data? And how would you handle missing data (if any)?

4.  How would you check for outliers? And how would handle outliers (if any)?







==> ENTER YOUR RESPONSES TO QUESTIONS 1-4 HERE

#### **Data overview and summary statistics**

Use the following methods and attributes on the dataframe:

* `head()`
* `size`
* `describe()`
* `info()`

It's always helpful to have this information at the beginning of a project, where you can always refer back to if needed.

In [56]:
### YOUR CODE HERE ###

In [57]:
### YOUR CODE HERE ###

Generate summary statistics using the `describe()` method.

In [58]:
### YOUR CODE HERE ###

And summary information using the `info()` method.

In [59]:
### YOUR CODE HERE ###

## **PACE: Construct**

Consider the questions in your PACE Strategy Document to reflect on the Construct stage.

Consider the following questions as you prepare to deal with outliers:

1.   What are some ways to identify outliers?
2.   How do you make the decision to keep or exclude outliers from any future models?

==> ENTER YOUR RESPONSES TO QUESTIONS 1-2 HERE

### **Task 3a. Visualizations**

Select data visualization types that will help you understand and explain the data.

Now that you know which data columns you’ll use, it is time to decide which data visualization makes the most sense for EDA of the Waze dataset.

**Question:** What type of data visualization(s) will be most helpful?

* Line graph
* Bar chart
* Box plot
* Histogram
* Heat map
* Scatter plot
* A geographic map



==> ENTER YOUR RESPONSE HERE

Begin by examining the spread and distribution of important variables using box plots and histograms.

#### **`sessions`**

_The number of occurrence of a user opening the app during the month_

In [60]:
# Box plot
### YOUR CODE HERE ###


In [61]:
# Histogram
### YOUR CODE HERE ###

The `sessions` variable is a right-skewed distribution with half of the observations having 56 or fewer sessions. However, as indicated by the boxplot, some users have more than 700.

#### **`drives`**

_An occurrence of driving at least 1 km during the month_

In [62]:
# Box plot
### YOUR CODE HERE ###


In [63]:
# Histogram
### YOUR CODE HERE ###


The `drives` information follows a distribution similar to the `sessions` variable. It is right-skewed, approximately log-normal, with a median of 48. However, some drivers had over 400 drives in the last month.

#### **`total_sessions`**

_A model estimate of the total number of sessions since a user has onboarded_

In [64]:
# Box plot
### YOUR CODE HERE ###


In [65]:
# Histogram
### YOUR CODE HERE ###

The `total_sessions` is a right-skewed distribution. The median total number of sessions is 159.6. This is interesting information because, if the median number of sessions in the last month was 48 and the median total sessions was ~160, then it seems that a large proportion of a user's total drives might have taken place in the last month. This is something you can examine more closely later.

#### **`n_days_after_onboarding`**

_The number of days since a user signed up for the app_

In [66]:
# Box plot
### YOUR CODE HERE ###

In [67]:
# Histogram
### YOUR CODE HERE ###

The total user tenure (i.e., number of days since
onboarding) is a uniform distribution with values ranging from near-zero to \~3,500 (\~9.5 years).

#### **`driven_km_drives`**

_Total kilometers driven during the month_

In [68]:
# Box plot
### YOUR CODE HERE ###

In [69]:
# Histogram
### YOUR CODE HERE ###

The number of drives driven in the last month per user is a right-skewed distribution with half the users driving under 3,495 kilometers. As you discovered in the analysis from the previous course, the users in this dataset drive _a lot_. The longest distance driven in the month was over half the circumferene of the earth.

#### **`duration_minutes_drives`**

_Total duration driven in minutes during the month_

In [70]:
# Box plot
### YOUR CODE HERE ###

In [71]:
# Histogram
### YOUR CODE HERE ###

The `duration_minutes_drives` variable has a heavily skewed right tail. Half of the users drove less than \~1,478 minutes (\~25 hours), but some users clocked over 250 hours over the month.

#### **`activity_days`**

_Number of days the user opens the app during the month_

In [72]:
# Box plot
### YOUR CODE HERE ###

In [73]:
# Histogram
### YOUR CODE HERE ###

Within the last month, users opened the app a median of 16 times. The box plot reveals a centered distribution. The histogram shows a nearly uniform distribution of ~500 people opening the app on each count of days. However, there are ~250 people who didn't open the app at all and ~250 people who opened the app every day of the month.

This distribution is noteworthy because it does not mirror the `sessions` distribution, which you might think would be closely correlated with `activity_days`.

#### **`driving_days`**

_Number of days the user drives (at least 1 km) during the month_

In [74]:
# Box plot
### YOUR CODE HERE ###

In [75]:
# Histogram
### YOUR CODE HERE ###

The number of days users drove each month is almost uniform, and it largely correlates with the number of days they opened the app that month, except the `driving_days` distribution tails off on the right.

However, there were almost twice as many users (\~1,000 vs. \~550) who did not drive at all during the month. This might seem counterintuitive when considered together with the information from `activity_days`. That variable had \~500 users opening the app on each of most of the day counts, but there were only \~250 users who did not open the app at all during the month and ~250 users who opened the app every day. Flag this for further investigation later.

#### **`device`**

_The type of device a user starts a session with_

This is a categorical variable, so you do not plot a box plot for it. A good plot for a binary categorical variable is a pie chart.

In [76]:
# Pie chart
### YOUR CODE HERE ###

There are nearly twice as many iPhone users as Android users represented in this data.

#### **`label`**

_Binary target variable (“retained” vs “churned”) for if a user has churned anytime during the course of the month_

This is also a categorical variable, and as such would not be plotted as a box plot. Plot a pie chart instead.

In [77]:
# Pie chart
### YOUR CODE HERE ###

Less than 18% of the users churned.

#### **`driving_days` vs. `activity_days`**

Because both `driving_days` and `activity_days` represent counts of days over a month and they're also closely related, you can plot them together on a single histogram. This will help to better understand how they relate to each other without having to scroll back and forth comparing histograms in two different places.

Plot a histogram that, for each day, has a bar representing the counts of `driving_days` and `user_days`.

In [78]:
# Histogram
### YOUR CODE HERE ###

As observed previously, this might seem counterintuitive. After all, why are there _fewer_ people who didn't use the app at all during the month and _more_ people who didn't drive at all during the month?

On the other hand, it could just be illustrative of the fact that, while these variables are related to each other, they're not the same. People probably just open the app more than they use the app to drive&mdash;perhaps to check drive times or route information, to update settings, or even just by mistake.

Nonetheless, it might be worthwile to contact the data team at Waze to get more information about this, especially because it seems that the number of days in the month is not the same between variables.

Confirm the maximum number of days for each variable&mdash;`driving_days` and `activity_days`.

In [79]:
### YOUR CODE HERE ###

It's true. Although it's possible that not a single user drove all 31 days of the month, it's highly unlikely, considering there are 15,000 people represented in the dataset.

One other way to check the validity of these variables is to plot a simple scatter plot with the x-axis representing one variable and the y-axis representing the other.

In [80]:
# Scatter plot
### YOUR CODE HERE ###

Notice that there is a theoretical limit. If you use the app to drive, then by definition it must count as a day-use as well. In other words, you cannot have more drive-days than activity-days. None of the samples in this data violate this rule, which is good.

#### **Retention by device**

Plot a histogram that has four bars&mdash;one for each device-label combination&mdash;to show how many iPhone users were retained/churned and how many Android users were retained/churned.

In [81]:
# Histogram
### YOUR CODE HERE ###

The proportion of churned users to retained users is consistent between device types.

#### **Retention by kilometers driven per driving day**

In the previous course, you discovered that the median distance driven last month for users who churned was 8.33 km, versus 3.36 km for people who did not churn. Examine this further.

1. Create a new column in `df` called `km_per_driving_day`, which represents the mean distance driven per driving day for each user.

2. Call the `describe()` method on the new column.

In [82]:
# 1. Create `km_per_driving_day` column
### YOUR CODE HERE ###

# 2. Call `describe()` on the new column
### YOUR CODE HERE ###

What do you notice? The mean value is infinity, the standard deviation is NaN, and the max value is infinity. Why do you think this is?

This is the result of there being values of zero in the `driving_days` column. Pandas imputes a value of infinity in the corresponding rows of the new column because division by zero is undefined.

1. Convert these values from infinity to zero. You can use `np.inf` to refer to a value of infinity.

2. Call `describe()` on the `km_per_driving_day` column to verify that it worked.

In [83]:
# 1. Convert infinite values to zero
### YOUR CODE HERE ###

# 2. Confirm that it worked
### YOUR CODE HERE ###

The maximum value is 15,420 kilometers _per drive day_. This is physically impossible. Driving 100 km/hour for 12 hours is 1,200 km. It's unlikely many people averaged more than this each day they drove, so, for now, disregard rows where the distance in this column is greater than 1,200 km.

Plot a histogram of the new `km_per_driving_day` column, disregarding those users with values greater than 1,200 km. Each bar should be the same length and have two colors, one color representing the percent of the users in that bar that churned and the other representing the percent that were retained. This can be done by setting the `multiple` parameter of seaborn's [`histplot()`](https://seaborn.pydata.org/generated/seaborn.histplot.html) function to `fill`.

In [84]:
# Histogram
### YOUR CODE HERE ###

The churn rate tends to increase as the mean daily distance driven increases, confirming what was found in the previous course. It would be worth investigating further the reasons for long-distance users to discontinue using the app.

#### **Churn rate per number of driving days**

Create another histogram just like the previous one, only this time it should represent the churn rate for each number of driving days.

In [85]:
# Histogram
### YOUR CODE HERE ###

The churn rate is highest for people who didn't use Waze much during the last month. The more times they used the app, the less likely they were to churn. While 40% of the users who didn't use the app at all last month churned, nobody who used the app 30 days churned.

This isn't surprising. If people who used the app a lot churned, it would likely indicate dissatisfaction. When people who don't use the app churn, it might be the result of dissatisfaction in the past, or it might be indicative of a lesser need for a navigational app. Maybe they moved to a city with good public transportation and don't need to drive anymore.

#### **Proportion of sessions that occurred in the last month**

Create a new column `percent_sessions_in_last_month` that represents the percentage of each user's total sessions that were logged in their last month of use.

In [86]:
### YOUR CODE HERE ###

What is the median value of the new column?

In [87]:
### YOUR CODE HERE ###

Now, create a histogram depicting the distribution of values in this new column.

In [88]:
# Histogram
### YOUR CODE HERE ###

Check the median value of the `n_days_after_onboarding` variable.

In [89]:
### YOUR CODE HERE ###

Half of the people in the dataset had 40% or more of their sessions in just the last month, yet the overall median time since onboarding is almost five years.

Make a histogram of `n_days_after_onboarding` for just the people who had 40% or more of their total sessions in the last month.

In [90]:
# Histogram
### YOUR CODE HERE ###

The number of days since onboarding for users with 40% or more of their total sessions occurring in just the last month is a uniform distribution. This is very strange. It's worth asking Waze why so many long-time users suddenly used the app so much in the last month.

### **Task 3b. Handling outliers**

The box plots from the previous section indicated that many of these variables have outliers. These outliers do not seem to be data entry errors; they are present because of the right-skewed distributions.

Depending on what you'll be doing with this data, it may be useful to impute outlying data with more reasonable values. One way of performing this imputation is to set a threshold based on a percentile of the distribution.

To practice this technique, write a function that calculates the 95th percentile of a given column, then imputes values > the 95th percentile with the value at the 95th percentile.  such as the 95th percentile of the distribution.



In [91]:
### YOUR CODE HERE ###

Next, apply that function to the following columns:
* `sessions`
* `drives`
* `total_sessions`
* `driven_km_drives`
* `duration_minutes_drives`

In [92]:
### YOUR CODE HERE ###

Call `describe()` to see if your change worked.

In [93]:
### YOUR CODE HERE ###

#### **Conclusion**

Analysis revealed that the overall churn rate is \~17%, and that this rate is consistent between iPhone users and Android users.

Perhaps you feel that the more deeply you explore the data, the more questions arise. This is not uncommon! In this case, it's worth asking the Waze data team why so many users used the app so much in just the last month.

Also, EDA has revealed that users who drive very long distances on their driving days are _more_ likely to churn, but users who drive more often are _less_ likely to churn. The reason for this discrepancy is an opportunity for further investigation, and it would be something else to ask the Waze data team about.

<img src="images/Execute.png" width="100" height="100" align=left>

## **PACE: Execute**

Consider the questions in your PACE Strategy Document to reflect on the Execute stage.

### **Task 4a. Results and evaluation**

Having built visualizations in Python, what have you learned about the dataset? What other questions have your visualizations uncovered that you should pursue?

**Pro tip:** Put yourself in your client's perspective. What would they want to know?

Use the following code fields to pursue any additional EDA based on the visualizations you've already plotted. Also use the space to make sure your visualizations are clean, easily understandable, and accessible.

**Ask yourself:** Did you consider color, contrast, emphasis, and labeling?



==> ENTER YOUR RESPONSE HERE

I have learned ....

My other questions are ....

My client would likely want to know ...




Use the following two code blocks (add more blocks if you like) to do additional EDA you feel is important based on the given scenario.

In [94]:
### YOUR CODE HERE ###


In [95]:
### YOUR CODE HERE ###


### **Task 4b. Conclusion**

Now that you've explored and visualized your data, the next step is to share your findings with Harriet Hadzic, Waze's Director of Data Analysis. Consider the following questions as you prepare to write your executive summary. Think about key points you may want to share with the team, and what information is most relevant to the user churn project.

**Questions:**

1. What types of distributions did you notice in the variables? What did this tell you about the data?

2. Was there anything that led you to believe the data was erroneous or problematic in any way?

3. Did your investigation give rise to further questions that you would like to explore or ask the Waze team about?

4. What percentage of users churned and what percentage were retained?

5. What factors correlated with user churn? How?

6. Did newer uses have greater representation in this dataset than users with longer tenure? How do you know?


==> ENTER YOUR RESPONSES TO QUESTIONS 1-6 HERE




**Congratulations!** You've completed this lab. However, you may not notice a green check mark next to this item on Coursera's platform. Please continue your progress regardless of the check mark. Just click on the "save" icon at the top of this notebook to ensure your work has been logged.