# Tutorial 4: Effective Data Visualization 

### Lecture and Tutorial Learning Goals:

Expand your data visualization knowledge and tool set beyond what we have seen and practiced so far. We will move beyond scatter plots and learn other effective ways to visualize data, as well as some general rules of thumb to follow when creating visualizations. All visualization tasks this week will be applied to real world data sets.

After completing this week's lecture and tutorial work, you will be able to:

- Describe when to use the following kinds of visualizations:
    - scatter plots
    - line plots
    - bar plots
- Given a dataset and a question, select from the above plot types to create a visualization that best answers the question
- Given a visualization and a question, evaluate the effectiveness of the visualization and suggest improvements to better answer the question
- Identify rules of thumb for creating effective visualizations
- Define the two key aspects of altair objects:
    - mark objects
    - encodings
- Use the `altair` library in Python to create and refine the above visualizations using:
    - mark objects: `mark_point`, `mark_line`, `mark_bar`
    - encodings : `x`, `y`, `color`, `shape`
    - subplots: `facet`
- Describe the difference in raster and vector output formats
- Use `save` to save visualizations in `png` and `svg` formats

This tutorial covers parts of [Chapter 4](https://python.datasciencebook.ca/viz.html) of the online textbook. You should read this chapter before attempting the worksheet.
Any place you see `___`, you must fill in the function, variable, or data to complete the code. Substitute the `raise NotImplementedError` with your completed code and answers then proceed to run the cell!

In [None]:
### Run this cell before continuing.
import altair as alt
import pandas as pd

# Simplify working with large datasets in Altair
alt.data_transformers.enable('vegafusion')

**Question 0.1** 
<br> {points: 1}

Match the following definitions with the corresponding encoding mapping or function used in Python:

*Definitions*

A. Creates multiple subplots (or "facets") according to a categorical variable in the data. 

B. Connects data points with a line which facilitates seeing trends over time.

C. Colors graphical marks based on a particular variable in the data.

D. This encoding varies data points by shape based on a variable in the data.

E. Labels the y-axis. 


*Encodings and Functions*

1. `color`
2. `facet()`
3. `mark_line()`
4. `alt.Y().title("...")`
5. `shape`

_For every description, create an object using the letter associated with the definition and assign it to the corresponding number from the list above. For example: `B = 1`_

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(A)).encode("utf-8")+b"2fff8").hexdigest() == "3cf2797019a970975c0b9e78a8b07304d2a281fa", "type of A is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(A).encode("utf-8")+b"2fff8").hexdigest() == "1ef96ff0480bd7ebc7252d1480959e4e8eda19e2", "value of A is not correct"

assert sha1(str(type(B)).encode("utf-8")+b"2fff9").hexdigest() == "c3fcdddf7b45274a27b7725a453ef3555d31a101", "type of B is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(B).encode("utf-8")+b"2fff9").hexdigest() == "850b95741634320cfdc1cbc7b5e757a3b03e059c", "value of B is not correct"

assert sha1(str(type(C)).encode("utf-8")+b"2fffa").hexdigest() == "0f070d036896abeed24a95c2b15672a48bf7aa84", "type of C is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(C).encode("utf-8")+b"2fffa").hexdigest() == "91f5c9f56d2ab4c91977c4db1ff6680228e07dbb", "value of C is not correct"

assert sha1(str(type(D)).encode("utf-8")+b"2fffb").hexdigest() == "3bed34b3560b6acf2e5b8819ae4645ad1975c4ab", "type of D is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(D).encode("utf-8")+b"2fffb").hexdigest() == "021e49701a21de802937a347d5222ad0b28fab80", "value of D is not correct"

assert sha1(str(type(E)).encode("utf-8")+b"2fffc").hexdigest() == "5e7bdd36d327347b9eb0eb91f87b26bc6335c7eb", "type of E is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(E).encode("utf-8")+b"2fffc").hexdigest() == "12d2115885fb3d25cf657a2008ec5df67aed749f", "value of E is not correct"

print('Success!')

**Question 0.2** True or False:
<br> {points: 1}

We should save a plot as an `.svg` file if we want to be able to rescale it without losing quality.

*Assign your answer to an object called `answer0_2`. Make sure your answer is either `True` or `False`.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer0_2)).encode("utf-8")+b"b9cdb").hexdigest() == "c1a14e26e0620ad97d9304575779d7160f258ef4", "type of answer0_2 is not bool. answer0_2 should be a bool"
assert sha1(str(answer0_2).encode("utf-8")+b"b9cdb").hexdigest() == "4fdc4832d144addb42fef1704c9dbccbe858e7ae", "boolean value of answer0_2 is not correct"

print('Success!')

## 1. Data on Personal Medical Costs 

As we saw in the worksheet, data scientists work in all types of organizations and with all kinds of problems. One of these types of organizations are companies in the private sector that work with health data. Today we will be looking at data on personal medical costs. There are varying factors that affect health and consequently medical costs. Our goal for today is to determine how variables are related to the medical costs billed by health insurance companies. 


To analyze this, we will be looking at a dataset that includes the following columns:

* `age`: age of primary beneficiary
* `sex`: insurance contractor gender: female, male
* `bmi`: body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg/$m^{2}$) using the ratio of height to weight, ideally 18.5 to 24.9
* `children`: number of children covered by health insurance / number of dependents
* `smoker`: smoking
* `region`: the beneficiary's residential area in the US: northeast, southeast, southwest, northwest.
* `charges`: individual medical costs billed by health insurance

*This dataset, was taken from the [collection of Data Sets](https://github.com/stedy/Machine-Learning-with-R-datasets) created and curated for the [Machine Learning with R](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-r) book by Brett Lantz.*

**Question 1.1** Yes or No: 
<br> {points: 1}

Based on the information given in the cell above, do you think the column `charges` includes quantitative/numerical data? 

*Assign your answer to an object called `answer1_1`. Make sure your answer is written in lowercase and is surrounded by quotation marks (e.g. `"yes"` or `"no"`).*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_1)).encode("utf-8")+b"678f2").hexdigest() == "4959f65d861fe73fc28048598b050dd992dace50", "type of answer1_1 is not str. answer1_1 should be an str"
assert sha1(str(len(answer1_1)).encode("utf-8")+b"678f2").hexdigest() == "e6d6de58a5553dab314352e9039d6369703bf71d", "length of answer1_1 is not correct"
assert sha1(str(answer1_1.lower()).encode("utf-8")+b"678f2").hexdigest() == "fef55431241493e87811c5ab7d723fb48efe52e2", "value of answer1_1 is not correct"
assert sha1(str(answer1_1).encode("utf-8")+b"678f2").hexdigest() == "fef55431241493e87811c5ab7d723fb48efe52e2", "correct string value of answer1_1 but incorrect case of letters"

print('Success!')

**Question 1.2** Multiple Choice:
<br> {points: 1}

Assuming overplotting is not an issue, which plot would be the most effective to compare the relationship of `age` and `charges`?

A. Scatterplot 

B. Stacked Bar Plot 

C. Bar Plot 

*Assign your answer to an object called `answer1_2`. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. `"F"`).*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_2)).encode("utf-8")+b"b01e4").hexdigest() == "3c9412f874ef42c2aa7fc1cfe82e83b791289e6d", "type of answer1_2 is not str. answer1_2 should be an str"
assert sha1(str(len(answer1_2)).encode("utf-8")+b"b01e4").hexdigest() == "cfb76c7e547605333d2af0aac4faea9a8f49ef06", "length of answer1_2 is not correct"
assert sha1(str(answer1_2.lower()).encode("utf-8")+b"b01e4").hexdigest() == "bd10f1af1d3d8938a93a8c525d3cc6ed24cf5184", "value of answer1_2 is not correct"
assert sha1(str(answer1_2).encode("utf-8")+b"b01e4").hexdigest() == "3e950333ccfb4011b417123c70daadf45728bb03", "correct string value of answer1_2 but incorrect case of letters"

print('Success!')

**Question 1.3**
<br> {points: 1}

Read the `insurance.csv` file in the `data` folder. 

*Assign your answer to an object called `insurance`.*

In [None]:
# your code here
raise NotImplementedError
insurance

In [None]:
from hashlib import sha1
assert sha1(str(type(insurance is None)).encode("utf-8")+b"eaa92").hexdigest() == "a46bef20ff661b158c859b94b242bca573ec682b", "type of insurance is None is not bool. insurance is None should be a bool"
assert sha1(str(insurance is None).encode("utf-8")+b"eaa92").hexdigest() == "972b6069c86580fbdab49b0ea7de0db003f4d366", "boolean value of insurance is None is not correct"

assert sha1(str(type(insurance)).encode("utf-8")+b"eaa93").hexdigest() == "664ae69e7045a753c12a60b9e51bddf6ec16fc24", "type of type(insurance) is not correct"

assert sha1(str(type(insurance.shape)).encode("utf-8")+b"eaa94").hexdigest() == "1759998d95f71f1d3a8a5dfd1ffe605cd2cac10c", "type of insurance.shape is not tuple. insurance.shape should be a tuple"
assert sha1(str(len(insurance.shape)).encode("utf-8")+b"eaa94").hexdigest() == "386d1a62add2a10e46d65bfec4e4e5ef92252695", "length of insurance.shape is not correct"
assert sha1(str(sorted(map(str, insurance.shape))).encode("utf-8")+b"eaa94").hexdigest() == "891e50106b98456481fb4abae73fbc96b6832755", "values of insurance.shape are not correct"
assert sha1(str(insurance.shape).encode("utf-8")+b"eaa94").hexdigest() == "c14fb1b524d89f85e0d6a03a423569b726c075a4", "order of elements of insurance.shape is not correct"

assert sha1(str(type(round(sum(insurance.age), 2))).encode("utf-8")+b"eaa95").hexdigest() == "cfa6511142ce53277e326f529badbc4cd7a54df5", "type of round(sum(insurance.age), 2) is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(round(sum(insurance.age), 2)).encode("utf-8")+b"eaa95").hexdigest() == "ec04bc8e7210856a9edc5596d6e71d4beb99fb60", "value of round(sum(insurance.age), 2) is not correct"

assert sha1(str(type(len(insurance.region.unique()))).encode("utf-8")+b"eaa96").hexdigest() == "33e908f193cf0a8b4a0cc2ef375bcebb381d1ad6", "type of len(insurance.region.unique()) is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(len(insurance.region.unique())).encode("utf-8")+b"eaa96").hexdigest() == "63bd292673c1eef97acd9b1311d8ee1ff7a05985", "value of len(insurance.region.unique()) is not correct"

print('Success!')

**Question 1.4** 
<br> {points: 3}

Looking over the excerpt of loaded data shown above, what initial observations can you make about the relationship between medical charges and age? How about medical charges and BMI? Finally, what about medical charges and smoking? 

Also, comment on whether our observations might change if we visualize the data? And/or whether visualizing the data might allow us to more easily make observations about the relationships in the data as opposed to trying to make them directly from the data table?

Answer in the cell below.

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.5**
<br> {points: 1}

According to the National Heart, Lung and Blood Institute of the US: "The higher your BMI, the higher your risk for certain diseases such as heart disease, high blood pressure, type 2 diabetes, gallstones, breathing problems, and certain cancers". 

Based on this information, we can hypothesize that individuals with a higher BMI are likely to have more medical costs. Let's use our data and see if this holds true. Create a scatter plot of `charges` (y-axis) versus `bmi` (x-axis).

In the scaffolding we provide below, ensure that you set `opacity` to 0.4. `opacity` sets the transparency of points on a scatter plot, and increasing transparency (reducing opacity) one strategy to deal with over-plotting issues.

*Assign your answer to an object called `bmi_plot`. Make sure to label your axes appropriately.*

In [None]:
# ___ = alt.Chart(
#     insurance,
#     title=___  # Set the title for the entire plot
# ).mark_point(opacity=___).encode(
#     x=alt.X(___)
#         .___(___)
#         .scale(zero=False),
#     y=alt.Y(___).___(___)
# )

# your code here
raise NotImplementedError
bmi_plot

In [None]:
from hashlib import sha1
assert sha1(str(type(bmi_plot is None)).encode("utf-8")+b"472ae").hexdigest() == "1193d6e28acd3d7bdd9fb79cf436b3202175c987", "type of bmi_plot is None is not bool. bmi_plot is None should be a bool"
assert sha1(str(bmi_plot is None).encode("utf-8")+b"472ae").hexdigest() == "bb865e5e89d954ce2bce84d8ca122a2500e2100f", "boolean value of bmi_plot is None is not correct"

assert sha1(str(type(bmi_plot.encoding.x['shorthand'])).encode("utf-8")+b"472af").hexdigest() == "033c9c378005e993b94d4b4d5ff83eaf5f0fb7b3", "type of bmi_plot.encoding.x['shorthand'] is not str. bmi_plot.encoding.x['shorthand'] should be an str"
assert sha1(str(len(bmi_plot.encoding.x['shorthand'])).encode("utf-8")+b"472af").hexdigest() == "4d5afa3748afb4c7d576a3bf25c6c8cd435e30a6", "length of bmi_plot.encoding.x['shorthand'] is not correct"
assert sha1(str(bmi_plot.encoding.x['shorthand'].lower()).encode("utf-8")+b"472af").hexdigest() == "ec9ee9185d778042fb3d534ffde57d6edd72909c", "value of bmi_plot.encoding.x['shorthand'] is not correct"
assert sha1(str(bmi_plot.encoding.x['shorthand']).encode("utf-8")+b"472af").hexdigest() == "ec9ee9185d778042fb3d534ffde57d6edd72909c", "correct string value of bmi_plot.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(bmi_plot.encoding.y['shorthand'])).encode("utf-8")+b"472b0").hexdigest() == "caedcdf77662a3f09655c358a0c07a6fb066b672", "type of bmi_plot.encoding.y['shorthand'] is not str. bmi_plot.encoding.y['shorthand'] should be an str"
assert sha1(str(len(bmi_plot.encoding.y['shorthand'])).encode("utf-8")+b"472b0").hexdigest() == "e2a65ce822ad139fb5a25263adfd107f04afb203", "length of bmi_plot.encoding.y['shorthand'] is not correct"
assert sha1(str(bmi_plot.encoding.y['shorthand'].lower()).encode("utf-8")+b"472b0").hexdigest() == "ee36fc007e7ab4f2f854a182b05520f886cd578c", "value of bmi_plot.encoding.y['shorthand'] is not correct"
assert sha1(str(bmi_plot.encoding.y['shorthand']).encode("utf-8")+b"472b0").hexdigest() == "ee36fc007e7ab4f2f854a182b05520f886cd578c", "correct string value of bmi_plot.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(bmi_plot.mark)).encode("utf-8")+b"472b1").hexdigest() == "09badd2c8fc9cf48f407fcf156460d46ec6a1c6c", "type of bmi_plot.mark is not correct"
assert sha1(str(bmi_plot.mark).encode("utf-8")+b"472b1").hexdigest() == "741424a3397894fd12fdd3edfc3ca92c672928d8", "value of bmi_plot.mark is not correct"

assert sha1(str(type(isinstance(bmi_plot.encoding.x['title'], str))).encode("utf-8")+b"472b2").hexdigest() == "6722f5f88507506a4577baf96c2afb5c72cbd2fe", "type of isinstance(bmi_plot.encoding.x['title'], str) is not bool. isinstance(bmi_plot.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(bmi_plot.encoding.x['title'], str)).encode("utf-8")+b"472b2").hexdigest() == "068b96cae34378e8badaa36a1cac0d2cb0fb89aa", "boolean value of isinstance(bmi_plot.encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(bmi_plot.encoding.y['title'], str))).encode("utf-8")+b"472b3").hexdigest() == "97f0eefc70f11efea8f9fbcf91d1cc37a00069cc", "type of isinstance(bmi_plot.encoding.y['title'], str) is not bool. isinstance(bmi_plot.encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(bmi_plot.encoding.y['title'], str)).encode("utf-8")+b"472b3").hexdigest() == "e86548285752d265d3835ee5ddb44b2b7b68b976", "boolean value of isinstance(bmi_plot.encoding.y['title'], str) is not correct"

print('Success!')

**Question 1.6**
<br> {points: 3}

Analysis: Comment on the effectiveness of the plot. Take into consideration the rules of thumb discussed in lecture. Also comment on what could be improved for this plot and also what is done correctly. 

Answer in the cell below.

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.7**
<br> {points: 3}

Analysis: What do you observe from the scatter plot? Do the data suggest that there might be evidence of a relationship between BMI and medical costs of individuals? 
From this plot alone, can we say higher BMI causes higher medical charges? Why or why not? 

Answer in the cell below. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.8**
<br> {points: 3}

Again, based on information from the National Heart, Lung and Blood Institute of the US, smoking cigarettes is said to be a risk factor for obesity. Create the same plot as you did in **Question 1.5** but this time add the `color` encoding to observe if smoking might affect the body mass of individuals. Keep `opacity = 0.4` to make the scatter points transparent.

*Assign your answer to an object called `smoke_plot`. Make sure to label your axes and the legend appropriately.*

In [None]:
# your code here
raise NotImplementedError
smoke_plot

In [None]:
from hashlib import sha1
assert sha1(str(type(smoke_plot is None)).encode("utf-8")+b"a1d0e").hexdigest() == "5d2729256924d973ded6e6c1e1590736ca5f7c13", "type of smoke_plot is None is not bool. smoke_plot is None should be a bool"
assert sha1(str(smoke_plot is None).encode("utf-8")+b"a1d0e").hexdigest() == "3c602b9285a9b032e6f0799db94eee0318c5939c", "boolean value of smoke_plot is None is not correct"

assert sha1(str(type(smoke_plot.mark)).encode("utf-8")+b"a1d0f").hexdigest() == "632ec55be177ce1b1d81b8fa663a4d1a10eb29f0", "type of smoke_plot.mark is not correct"
assert sha1(str(smoke_plot.mark).encode("utf-8")+b"a1d0f").hexdigest() == "356c4d024752333c8f782e6f29ae4442d56a294a", "value of smoke_plot.mark is not correct"

assert sha1(str(type(smoke_plot.encoding.x['shorthand'])).encode("utf-8")+b"a1d10").hexdigest() == "cf2620faaaadb4567c45c64fd5d8926804584673", "type of smoke_plot.encoding.x['shorthand'] is not str. smoke_plot.encoding.x['shorthand'] should be an str"
assert sha1(str(len(smoke_plot.encoding.x['shorthand'])).encode("utf-8")+b"a1d10").hexdigest() == "243f3bd290a3cc0cf71c746a0a50f9a8a4680870", "length of smoke_plot.encoding.x['shorthand'] is not correct"
assert sha1(str(smoke_plot.encoding.x['shorthand'].lower()).encode("utf-8")+b"a1d10").hexdigest() == "2bf931ac6d652813c22ba6d7b709cfe96698d3ea", "value of smoke_plot.encoding.x['shorthand'] is not correct"
assert sha1(str(smoke_plot.encoding.x['shorthand']).encode("utf-8")+b"a1d10").hexdigest() == "2bf931ac6d652813c22ba6d7b709cfe96698d3ea", "correct string value of smoke_plot.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(smoke_plot.encoding.y['shorthand'])).encode("utf-8")+b"a1d11").hexdigest() == "09572e6399428ff4183d0376bc008f3b135aa932", "type of smoke_plot.encoding.y['shorthand'] is not str. smoke_plot.encoding.y['shorthand'] should be an str"
assert sha1(str(len(smoke_plot.encoding.y['shorthand'])).encode("utf-8")+b"a1d11").hexdigest() == "71a5d754863484cc75c16b2b327025552e7726e1", "length of smoke_plot.encoding.y['shorthand'] is not correct"
assert sha1(str(smoke_plot.encoding.y['shorthand'].lower()).encode("utf-8")+b"a1d11").hexdigest() == "ac89f7221894bb7edf111d4bcdb15322ce667667", "value of smoke_plot.encoding.y['shorthand'] is not correct"
assert sha1(str(smoke_plot.encoding.y['shorthand']).encode("utf-8")+b"a1d11").hexdigest() == "ac89f7221894bb7edf111d4bcdb15322ce667667", "correct string value of smoke_plot.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(smoke_plot.encoding.color['shorthand'])).encode("utf-8")+b"a1d12").hexdigest() == "5e146436c95c3be0c0b5891dba8fa3915c86ad72", "type of smoke_plot.encoding.color['shorthand'] is not str. smoke_plot.encoding.color['shorthand'] should be an str"
assert sha1(str(len(smoke_plot.encoding.color['shorthand'])).encode("utf-8")+b"a1d12").hexdigest() == "410d10639964c884ed8546f208184ca298f414df", "length of smoke_plot.encoding.color['shorthand'] is not correct"
assert sha1(str(smoke_plot.encoding.color['shorthand'].lower()).encode("utf-8")+b"a1d12").hexdigest() == "aaa8ea7f22cfe19d2d5c4963a7f1d43057ddac88", "value of smoke_plot.encoding.color['shorthand'] is not correct"
assert sha1(str(smoke_plot.encoding.color['shorthand']).encode("utf-8")+b"a1d12").hexdigest() == "aaa8ea7f22cfe19d2d5c4963a7f1d43057ddac88", "correct string value of smoke_plot.encoding.color['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(smoke_plot.encoding.x['title'], str))).encode("utf-8")+b"a1d13").hexdigest() == "e363de156d5cd8b7b23cf84c70fe0f89ca8d37a2", "type of isinstance(smoke_plot.encoding.x['title'], str) is not bool. isinstance(smoke_plot.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(smoke_plot.encoding.x['title'], str)).encode("utf-8")+b"a1d13").hexdigest() == "9712a542b912cdffee876ae2cbcf02af75879895", "boolean value of isinstance(smoke_plot.encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(smoke_plot.encoding.y['title'], str))).encode("utf-8")+b"a1d14").hexdigest() == "7a5fc118c48fa58af7a92e7988d8582121842046", "type of isinstance(smoke_plot.encoding.y['title'], str) is not bool. isinstance(smoke_plot.encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(smoke_plot.encoding.y['title'], str)).encode("utf-8")+b"a1d14").hexdigest() == "8bcc6e5b50c7796e4e6a65a604595a2f7fc94b24", "boolean value of isinstance(smoke_plot.encoding.y['title'], str) is not correct"

assert sha1(str(type(isinstance(smoke_plot.encoding.color['title'], str))).encode("utf-8")+b"a1d15").hexdigest() == "730fa2e1e008b8ef8497cebf60771335d691594f", "type of isinstance(smoke_plot.encoding.color['title'], str) is not bool. isinstance(smoke_plot.encoding.color['title'], str) should be a bool"
assert sha1(str(isinstance(smoke_plot.encoding.color['title'], str)).encode("utf-8")+b"a1d15").hexdigest() == "bb9d1e886cf70630ac5d04a05f8f4342256ebba7", "boolean value of isinstance(smoke_plot.encoding.color['title'], str) is not correct"

print('Success!')

**Question 1.9.0** (Analyzing the Graph) True or False: 
<br> {points: 1}

Smokers generally have a lower BMI than non-smokers. 

*Assign your answer to an object called `answer1_9_0`. Make sure your answer is either `True` or `False`.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_9_0)).encode("utf-8")+b"64932").hexdigest() == "cc281831448f8b8aa80cb5d7c3baf3b17172a353", "type of answer1_9_0 is not bool. answer1_9_0 should be a bool"
assert sha1(str(answer1_9_0).encode("utf-8")+b"64932").hexdigest() == "75278ad1e5b66059430b17d670a4b182671abf11", "boolean value of answer1_9_0 is not correct"

print('Success!')

**Question 1.9.1** (Analyzing the Graph) True or False: 
<br> {points: 1}

Smokers generally have higher medical charges than non-smokers.

*Assign your answer to an object called `answer1_9_1`.Make sure your answer is either `True` or `False`.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_9_1)).encode("utf-8")+b"726fb").hexdigest() == "62b91597814a3332ac196fa07917c9cdabeb6a4a", "type of answer1_9_1 is not bool. answer1_9_1 should be a bool"
assert sha1(str(answer1_9_1).encode("utf-8")+b"726fb").hexdigest() == "7ae8479f7e92390e97392ac647c12c397eb00174", "boolean value of answer1_9_1 is not correct"

print('Success!')

**Question 1.10**
<br> {points: 1}

Finally, create a bar graph that displays the percentage of smokers for both females and males in the data set. Plot sex on the x-axis, and colour the bars to differentiate between smokers and nonsmokers. This could, for example, be used help us determine whether we should consider smoking behaviour when exploring whether there is a relationship between sex and medical costs.

To count the number of observations in each bar, we are using the `count()` aggregation in `altair`, just as we did for histograms in the textbook. We then convert the count to a percentage with `stack('normalize')`.

*Assign your answer to an object called `bar_plot`. Make sure to label your axes appropriately.*

>*Note - many historical datasets treated sex as a variable where the possible values are only binary: male or female. This representation in this question reflects how the data were historically collected and is not meant to imply that we believe that sex is binary.*

In [None]:
# ___ = alt.Chart(
#     insurance,
#     title=___
# ).mark_bar().encode(
#     x=alt.X(___)
#         .title(___),
#     y=alt.Y("count()")
#         .title(___)
#         .stack('normalize'),
#     color=alt.Color(___).title(___)
# )

# your code here
raise NotImplementedError
bar_plot

In [None]:
from hashlib import sha1
assert sha1(str(type(bar_plot is None)).encode("utf-8")+b"3557b").hexdigest() == "541db44201bb8811fa2a7aa74c60ad0d264bfe29", "type of bar_plot is None is not bool. bar_plot is None should be a bool"
assert sha1(str(bar_plot is None).encode("utf-8")+b"3557b").hexdigest() == "b4058927a4281f6f9931dd99d2de6b4a25eeb2fa", "boolean value of bar_plot is None is not correct"

assert sha1(str(type(bar_plot.encoding.x['shorthand'])).encode("utf-8")+b"3557c").hexdigest() == "860a4ea4b83906e827dfd927664e9c6d66aa5223", "type of bar_plot.encoding.x['shorthand'] is not str. bar_plot.encoding.x['shorthand'] should be an str"
assert sha1(str(len(bar_plot.encoding.x['shorthand'])).encode("utf-8")+b"3557c").hexdigest() == "a531be90ab5a5041830961e839554f0ca0e969a5", "length of bar_plot.encoding.x['shorthand'] is not correct"
assert sha1(str(bar_plot.encoding.x['shorthand'].lower()).encode("utf-8")+b"3557c").hexdigest() == "174af161e741636c604e8ca2d86d6eab37d8b941", "value of bar_plot.encoding.x['shorthand'] is not correct"
assert sha1(str(bar_plot.encoding.x['shorthand']).encode("utf-8")+b"3557c").hexdigest() == "174af161e741636c604e8ca2d86d6eab37d8b941", "correct string value of bar_plot.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(bar_plot.encoding.y['shorthand'])).encode("utf-8")+b"3557d").hexdigest() == "c297b0c94630ad81f23fd483b4ebbaf02fb63315", "type of bar_plot.encoding.y['shorthand'] is not str. bar_plot.encoding.y['shorthand'] should be an str"
assert sha1(str(len(bar_plot.encoding.y['shorthand'])).encode("utf-8")+b"3557d").hexdigest() == "69a8c36209b542a7efd5c6610d54448c5b3cc31f", "length of bar_plot.encoding.y['shorthand'] is not correct"
assert sha1(str(bar_plot.encoding.y['shorthand'].lower()).encode("utf-8")+b"3557d").hexdigest() == "66eb23ea94a5c6921be83d0ed9bd406a9931ef10", "value of bar_plot.encoding.y['shorthand'] is not correct"
assert sha1(str(bar_plot.encoding.y['shorthand']).encode("utf-8")+b"3557d").hexdigest() == "66eb23ea94a5c6921be83d0ed9bd406a9931ef10", "correct string value of bar_plot.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(bar_plot.encoding.color['shorthand'])).encode("utf-8")+b"3557e").hexdigest() == "10d352347a8302dc9d49c0ba84d4158f77b5ecc7", "type of bar_plot.encoding.color['shorthand'] is not str. bar_plot.encoding.color['shorthand'] should be an str"
assert sha1(str(len(bar_plot.encoding.color['shorthand'])).encode("utf-8")+b"3557e").hexdigest() == "302fd0bc1284fa3380e2957a3b7db48035c5167a", "length of bar_plot.encoding.color['shorthand'] is not correct"
assert sha1(str(bar_plot.encoding.color['shorthand'].lower()).encode("utf-8")+b"3557e").hexdigest() == "c1ac6ce335226621599f896444cac96a157a8608", "value of bar_plot.encoding.color['shorthand'] is not correct"
assert sha1(str(bar_plot.encoding.color['shorthand']).encode("utf-8")+b"3557e").hexdigest() == "c1ac6ce335226621599f896444cac96a157a8608", "correct string value of bar_plot.encoding.color['shorthand'] but incorrect case of letters"

assert sha1(str(type(bar_plot.encoding.y['stack'])).encode("utf-8")+b"3557f").hexdigest() == "b4767a6da9bfe81091957e989eb4e26fae7a4547", "type of bar_plot.encoding.y['stack'] is not str. bar_plot.encoding.y['stack'] should be an str"
assert sha1(str(len(bar_plot.encoding.y['stack'])).encode("utf-8")+b"3557f").hexdigest() == "c48ba5d795117a9ab8a3545869c89cc2f2134295", "length of bar_plot.encoding.y['stack'] is not correct"
assert sha1(str(bar_plot.encoding.y['stack'].lower()).encode("utf-8")+b"3557f").hexdigest() == "5bd1991c21f0f417a233d1f50b39f14774b03ff6", "value of bar_plot.encoding.y['stack'] is not correct"
assert sha1(str(bar_plot.encoding.y['stack']).encode("utf-8")+b"3557f").hexdigest() == "5bd1991c21f0f417a233d1f50b39f14774b03ff6", "correct string value of bar_plot.encoding.y['stack'] but incorrect case of letters"

assert sha1(str(type(bar_plot.mark)).encode("utf-8")+b"35580").hexdigest() == "d378a0f5e37a2dc87d87ae5dec254dbb09b6edc4", "type of bar_plot.mark is not str. bar_plot.mark should be an str"
assert sha1(str(len(bar_plot.mark)).encode("utf-8")+b"35580").hexdigest() == "37ce8e4f389f78734af7e7c089432bc68a916764", "length of bar_plot.mark is not correct"
assert sha1(str(bar_plot.mark.lower()).encode("utf-8")+b"35580").hexdigest() == "50c25729064f3a96875d0ffaec391091ef2786b0", "value of bar_plot.mark is not correct"
assert sha1(str(bar_plot.mark).encode("utf-8")+b"35580").hexdigest() == "50c25729064f3a96875d0ffaec391091ef2786b0", "correct string value of bar_plot.mark but incorrect case of letters"

assert sha1(str(type(isinstance(bar_plot.encoding.x['title'], str))).encode("utf-8")+b"35581").hexdigest() == "a89256fdb5ad5afc4c82d29996c49e1498fe07b1", "type of isinstance(bar_plot.encoding.x['title'], str) is not bool. isinstance(bar_plot.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(bar_plot.encoding.x['title'], str)).encode("utf-8")+b"35581").hexdigest() == "669e84d4caa3f690df4c5a2c7473af6edc52cec0", "boolean value of isinstance(bar_plot.encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(bar_plot.encoding.y['title'], str))).encode("utf-8")+b"35582").hexdigest() == "cab5bb99158a504e68be04da4d5ef8cf119b3608", "type of isinstance(bar_plot.encoding.y['title'], str) is not bool. isinstance(bar_plot.encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(bar_plot.encoding.y['title'], str)).encode("utf-8")+b"35582").hexdigest() == "ae8208c581f15219a0fd4f073290c3acb878f2bb", "boolean value of isinstance(bar_plot.encoding.y['title'], str) is not correct"

assert sha1(str(type(isinstance(bar_plot.encoding.color['title'], str))).encode("utf-8")+b"35583").hexdigest() == "10dfcca0e1b03f139dd45ebab8c22eef40f4b60a", "type of isinstance(bar_plot.encoding.color['title'], str) is not bool. isinstance(bar_plot.encoding.color['title'], str) should be a bool"
assert sha1(str(isinstance(bar_plot.encoding.color['title'], str)).encode("utf-8")+b"35583").hexdigest() == "0a3db160ba5ee9e844ce4bdd041a53f98083118b", "boolean value of isinstance(bar_plot.encoding.color['title'], str) is not correct"

print('Success!')

**Question 1.11**
<br> {points: 1}

Based on the graph, is the percentage of smokers higher amongst men or women?

*Assign your answer to an object called `answer1_11`. Make sure your answer is in lowercase and is surrounded by quotation marks (e.g. `"male"` or `"female"`).*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_11)).encode("utf-8")+b"db30f").hexdigest() == "4253dbcf0b8a6c2e4856718e1c92fc9f0adc5c65", "type of answer1_11 is not str. answer1_11 should be an str"
assert sha1(str(len(answer1_11)).encode("utf-8")+b"db30f").hexdigest() == "1ee7f645315ff7e68a32c929cb43b2a4359015d5", "length of answer1_11 is not correct"
assert sha1(str(answer1_11.lower()).encode("utf-8")+b"db30f").hexdigest() == "13a93a5ed8eff6ef6bea42dad3f8b6da2940c917", "value of answer1_11 is not correct"
assert sha1(str(answer1_11).encode("utf-8")+b"db30f").hexdigest() == "13a93a5ed8eff6ef6bea42dad3f8b6da2940c917", "correct string value of answer1_11 but incorrect case of letters"

print('Success!')

## 2. Color Schemes (beyond the defaults)
{points: 1}

In the worksheet and this tutorial, you have seen the same colours again and again. These are from the default `altair` color scheme, which is an effective choice in most cases. But what if you want different colors? There are other color palettes available in Altair and you can change the color scheme by specifying its name as a string inside `scale(scheme='...')`. 

The documentation lists [all the available color schemes](https://vega.github.io/vega/docs/schemes/) and describes what type of data each color scheme is useful for (categorical, sequential, diverging, cyclic). The default categorical colormap used in Altair is "Tableau10", which consists of 10 colors and starts with blue, orange, and red. You can also [create your own color scheme](https://altair-viz.github.io/user_guide/customization.html#color-domain-and-range) by supplying the names of the individual colors you want to use (there is [a lot of color names to choose from](https://stackoverflow.com/questions/22408237/named-colors-in-matplotlib/37232760#37232760)).

Use the chart you created in Q1.10 and change the color scheme to your favourite from the documentation page linked above. Remember that instead of recreating the entire chart from scratch, you can use the `bar_plot` variable you already created and re-encode only the color channel using the same variable `smoker`, but a different color scheme (it is also fine if you prefer to copy all the code). Optionally, you can also use this [color blindness simulator](https://www.color-blindness.com/coblis-color-blindness-simulator/) to check if your visualization is color blind friendly.

*Assign your answer to an object called `bar_plot_colorscheme`.*

In [None]:
# bar_plot_colorscheme = bar_plot.encode(
#     ___=alt.Color(___)
#         .___(___=___)
#         .___(___)
# )

# your code here
raise NotImplementedError
bar_plot_colorscheme

In [None]:
from hashlib import sha1
assert sha1(str(type(bar_plot_colorscheme is None)).encode("utf-8")+b"1e368").hexdigest() == "4c81312d5578e0635dab347e60912e32f3db1c24", "type of bar_plot_colorscheme is None is not bool. bar_plot_colorscheme is None should be a bool"
assert sha1(str(bar_plot_colorscheme is None).encode("utf-8")+b"1e368").hexdigest() == "e504b451562234b791acab34bcf5a681ea3ffd27", "boolean value of bar_plot_colorscheme is None is not correct"

assert sha1(str(type(bar_plot.encoding.color['shorthand'])).encode("utf-8")+b"1e369").hexdigest() == "b942f73655e3a64ab72eb825ec839ae572b809ce", "type of bar_plot.encoding.color['shorthand'] is not str. bar_plot.encoding.color['shorthand'] should be an str"
assert sha1(str(len(bar_plot.encoding.color['shorthand'])).encode("utf-8")+b"1e369").hexdigest() == "d656e3e7116d74847c779cad82de4598c956ee1a", "length of bar_plot.encoding.color['shorthand'] is not correct"
assert sha1(str(bar_plot.encoding.color['shorthand'].lower()).encode("utf-8")+b"1e369").hexdigest() == "6f6f987047b281444ce5ce2847b82310fd2586cb", "value of bar_plot.encoding.color['shorthand'] is not correct"
assert sha1(str(bar_plot.encoding.color['shorthand']).encode("utf-8")+b"1e369").hexdigest() == "6f6f987047b281444ce5ce2847b82310fd2586cb", "correct string value of bar_plot.encoding.color['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(bar_plot_colorscheme.encoding.color['scale']['scheme'], str))).encode("utf-8")+b"1e36a").hexdigest() == "e8e1e44b3e7e6391507e51d8c18b3ec33ab47eaa", "type of isinstance(bar_plot_colorscheme.encoding.color['scale']['scheme'], str) is not bool. isinstance(bar_plot_colorscheme.encoding.color['scale']['scheme'], str) should be a bool"
assert sha1(str(isinstance(bar_plot_colorscheme.encoding.color['scale']['scheme'], str)).encode("utf-8")+b"1e36a").hexdigest() == "0e2b36052678af9c24b4ff9a0f1c1f3e1ad500c5", "boolean value of isinstance(bar_plot_colorscheme.encoding.color['scale']['scheme'], str) is not correct"

assert sha1(str(type(isinstance(bar_plot.encoding.color['title'], str))).encode("utf-8")+b"1e36b").hexdigest() == "dde6f4be83013bb75e8d10a9ae819c8452e8107a", "type of isinstance(bar_plot.encoding.color['title'], str) is not bool. isinstance(bar_plot.encoding.color['title'], str) should be a bool"
assert sha1(str(isinstance(bar_plot.encoding.color['title'], str)).encode("utf-8")+b"1e36b").hexdigest() == "3ba949bbe46dcacbe99eed43d880e4176fc23e0f", "boolean value of isinstance(bar_plot.encoding.color['title'], str) is not correct"

print('Success!')

## 3. Fast-Food Chains in the United States (Continued)
<br> {points: 3}

In `worksheet_viz`, we explored this data set together through some visualizations. Now, it is is all up to you. The goal of this assignment is to create **one** plot that can help you figure out which restaurant to open and where! Your goal is the same as in the worksheet: to figure out which fast food chain to open and figure out which state would be the least competitive.

After creating your visualization you need to write a paragraph explaining your visualization and why you chose it. Also, explain your conclusion from the visualization and reasoning as to how you came to that conclusion. You can use properly-cited outside information here to help support your reasoning (but **do not** download and analyze any data from an outside source in this notebook -- our autograder will not be able to see it). Finally, if there is some way that you could improve your visualization, but don't yet know how to do it, please explain what you would do if you knew how.

In answering this question, there is no need to restrict yourself to the west coast of the USA. Consider all states that you have data for. You have a variety of graphs to choose from, but before starting the assignment, discuss with a partner which plot would be the most optimal to answer this question.

*Note that some restaurant names are spelled incorrectly in data. For the purpose of this exercise you can ignore this and only count the spelling with the most entries for each restaurant.*

<img src="mcdonalds.jpg" width = "600"/>

In [None]:
# Write the code for your plot here
# your code here
raise NotImplementedError

*Write a paragraph explaining your visualization and why you chose it. Also explain your conclusion from the visualization and reasoning as to how you came to that conclusion. If you need to bring in outside information to help you answer your question, please feel free to do so. Finally, if there is some way that you could improve your visualization, but don't yet know how to do it, please explain what you would do if you knew how.*

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.