<a href="https://colab.research.google.com/github/pmontman/tmp_choicemodels/blob/main/nb/tutorials/practice_insemest.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PRACTICE for the QBUS3840 In-semester exam

# Rubric 
* The **marking scheme** is simple: Each question has a some points assigned. Then the points for each question are divided between
  * Code: 50% is it works OK, 35% if minor problems, 20% it does not work but is well explained.
  * Text explanations: 40% if it is: Clearly written. Complete, all points are addressed. Decisions are properly justified, the right reasons are given for the answer. Demonstrates knowledge of the topic, explaining nuances/ alternatives. Then it will degrade from 40% it is fails to achieve that.
  * Appearance: the remaining 10%. Structure sections if needed. Properly sized intputs. Even mix of code cells and explanations instead of very few large cells. Code should be readable.



#Guidelines

* The exam will be a colab notebook that you have o fill in, then upload to canvas.

* You will have **90 minutes** to do the exam,  plus 15 minutes to upload. This is an **important point**, try to become familiar with the functions to run biogeme, pandas, numpy... Even if you have full access to the material of the course and can look at online programming forums for python issues, it might take some extra time. You can also prepare you own auxiliary functions to reduce the verbosity of biogeme. We will see this in the practice notebook.

* **The questions will be very similar to what you will see in this practice notebook.** The point of the exam is to prove that you can do a basic analysis with a multinomial logit and use biogeme as a tool. The differenciation will come mostly in the type of data, what variables are involved and the 'what-if scenarios' questions.

* The answers should be technically correct, but **the explanation in the text cells should demonstrate knowledge** of what you are doing. Why did you make that decision?. For example, a variable transformation, why do you choose that particular one? Why do you choose to add that variable to the model? What do you think it is going to do? After the results: Are the results as expected? Please do not be afraid of being 'too obvious' when explaining something.
When explaining coefficients, Do these have the expected sign, what is the interpretation with respect to the reference alternative?
 A perfect code but no explanation will net you 50-60% of the marks. The opposite example, if you get stuck with a python issue but know 'conceptually' how to answer, writing a good text explanation and some pseudocode will potentially net you up to half marks.


* There will be no data cleaning involved, and the dataset will have full availability, you can create a full availability dictionary to pass to biogeme by just setting all entries to 1. ` av = {1:1, 2:1, 3:1, ...}`. We will see this in the practice.

* Please **do not to identify yourself explicitly in the anwers**, writing your name or student id. Besides that, you are free to express yourself.

* The 'visual appearance' part of the exam stands for a small percentage of the mark, 10%. Try to clarify, do not leave very large code cells followed/ preceded by large text cells, try to interleave them so it is more natural to follow. Section you answers if they become long or address different topics. Do not write very long outputs, for example do not print the full dataset. It is critical that the main part of each answer is cleary identified with its own text cell and code cell. The bad example would be a large text cell explaining all the steps and the a large code cell that prints the output, with the answers in the middle of the code. Of course, if your answer does not require code (it might happen) do not force a code chunk in. The way we will mark the visual part is to read the notebook and if something stands out in a negative sense, this subtracts points.

* There can be one or two small 'theoretical' questions that can be answered directly by understanding some theoretical concepts. The question can also
be solved practically by estimating a model to 'try' the ideas.




# The practice problem
We will model a dataset of choice of 'recreational fishing' mode. Fishers this is whether to go for a shipping trip in either the beach, the pier, a public charter boat or a private boat. The data was collected via phone interview and
the attributes of the alternatives are the cost of the trip and the 'catch rate', the expected number of catches per hour for the particular species of fish that each fisher was targeting in their trip.
The socio-economic characteristics is income, in fact the dataset was used to study different transformations of the income and price variable and how they influence utility, drawing deeper consequences for economic theory.

The reference study, including a more detailed description of the dataset ca be found [here (Section IV Data and references therein)](https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=1017&context=econ_las_pubs)

## Description of the dataset

Each row represents a different customer, customers are 'independent' of each other.

The variables in the dataset are:

**mode**: a categorical variable indicating the fishing model selected for the trip. It is encoded in numbers, with the code:
 1. Beach
 2. Pier
 3. Private boat
 4. Charter boat

**price_x**:  Cost of the fishing mode, variable in dollars. Where x stands for one of the alternatives, e.g. price_beach is the cost of the fishing from the beach in one fishing trip.

**catch_x**: Catch rate, in catches per hour. Where x stands for one of the alternatives, e.g. catch_beach is the catch rate of the beach alternative.

**income**: Monthly income of the recreational fisher, in dollars.


---
---

# Preparing the environment
*The preparation and dataset loading code is given to the students*

In [None]:
!pip install biogeme



Load the packages, feel free to change the names.

In [None]:
import pandas  as pd
import numpy as np
import matplotlib.pyplot as plt

import biogeme.database as db
import biogeme.biogeme as bio
import biogeme.models as models
import biogeme.expressions as exp

# Load the dataset

In [None]:
path = 'https://raw.githubusercontent.com/pmontman/pub-choicemodels/main/data/fishing.csv'
fish_pd = pd.read_csv(path)

A simple look at the dataset.

In [None]:
fish_pd.head(5)

Unnamed: 0,mode,price_beach,price_pier,price_boat,price_charter,catch_beach,catch_pier,catch_boat,catch_charter,income
0,4,157.93,157.93,157.93,182.93,0.0678,0.0503,0.2601,0.5391,7083.3317
1,4,15.114,15.114,10.534,34.534,0.1049,0.0451,0.1574,0.4671,1249.9998
2,3,161.874,161.874,24.334,59.334,0.5333,0.4522,0.2413,1.0266,3749.9999
3,2,15.134,15.134,55.93,84.93,0.0678,0.0789,0.1643,0.5391,2083.3332
4,3,106.93,106.93,41.514,71.014,0.0678,0.0503,0.1082,0.324,4583.332


---
---

# 1) Adjust a model with alternative specific constants and shared parameters for price and catch rate. Select one of the alternatives as the reference (pick the one that you prefer). Comment on the results: Signs of the variables and alternative specific constants.

---
---

# 2) Calculate the willingness to pay for increasing the catch rate and comment on the interpretation



---
---

# 3) Fit per-alternative parameters for cost and catch rate. Add one variable that has not been considered, apply a transformation of your choosing (to the new or other variables) and estimate a new model. Comment on the results and compare the new model to the model in Exercise 1. What changes are relevant? Is the new model a better fit?


---
---

# 4) Calculate the accuracy of that model and confusion matrix, comment on the results.


---
---

# 5) Suppose that the company that runs the charter boats is offering a 75% discount for the population with a monthly income under 2100 dollars. What would be the market share for each of the alernatives in the new situation? Use your model in exercise 3.

---
---

# 6) Due to poor weather conditions at sea, the fishing trips that go farther away from the coast (both private and charter boats) are going to cut their capture rate by half during the season. What would be the expected impact in the total revenue from fishing trips during the season (assume that everything else stays the same (the same fishers still go for a trip and the remaining variables do not change). Use your model in exercise 3.