# Activity 1 - core probability concepts

**GOALS [NOTION]**:
* core probabilities' lingo
  * events and probability distributions
  * sum rule (_a.k.a._ union of events)
  * conditional probability
  * product rule (_a.k.a_ chain rule)
  * independency
* random variables [r.v.]
  * probability mass function
  * cumulative distribution function
  * expected value and variance
  * discrete vs continuous r.v.'s
  * joint probability
  * marginal probability

## 1.1 there is reason behind!

Probability theory arises as the _optimal_ way to measure plausibility quantitatively. There - we said it.

When we talk about the probability of an event, there's always a set of assumptions that go with it - and speaking of probabilities wouldn't ever make sense otherwise.

So let's consider the setting for this (sub-)activity, and the assumptions that are implicit on it - making it _explicit enough_:
* you have a deck of cards standing on top of a table;
* the deck is clearly wasted, and it is the first time you take a look at it;
* you know - or some friend just told you - that the deck is full: it has 52 cards;
* you cannot resist but to peak on the first one.

Now:
1. What probability _are you willing_ to attribute for picking the ace of spaces? (the implicit _pretension_ here is that _you_ are willing to be _rational_ and have no reason to believe one card is _more likely_ to show up than any other!)
2. Let's hold on that pretension; say $A$ represents the event that you pick an ace - any ace - what's $P(A)$?
3. Say $B$ represents the event that you pick a spades card - any spade... - what's $P(B)$?
4. What's the probability that you either get an ace or a space? - or, _in other words_, what's $P(A\cup B)$?
5. Use the sum rule to arrive at the same result.
6. Now say your friend peaked at the card and didn't show it to you; instead, she told you it is indeed a spades card (our suspicion was right all along!) - what's the probability that it is a number card?
7. Say $N$ represents the event that you pick a number card - how would you represent the _expression_ implicit on the last question with some beautiful probability lingo/jargon/symbols/...hieroglyphs?
8. If your friend had instead told you it was an even number, what would you then _assign_ for the probability of it being a spade?
9. Is it any different than the probability of being a spade without any differentiating prior knowledge about the card?
10. Are the two events - $E$ for drawing an even number, and $B$ for drawing a spade - independent?
11. Use the product rule (_a.k.a_ the chain rule) to support your answer.
12. What's $P(A|E)$?
13. Is $A$ independent from $E$?
14. What's $P(E\cap B|N)$?
15. Now say you not only want to peak at the first card, but at the first two; say also that $N_1$ and $N_2$ represent the events that the first and second cards are numbers, respectively. What's $P(N_1, N_2)$?
16. Following the same "subscripting logic" (- let's call it that), what's $P(E_2)$?
17. What's $P(E_2,N_1)$? - is it the same as $P(N_1, E_2)$? - what about $P(N_2, E_1)$?
18. What's $P(N_1|E_2)$? - fun (use the rules - always better than thinking! - cof... cof!)
19. Tricky question alert: why is it fun? - is it the question; is it the result? - compare it with $P(N_2|E_1)$.

## 1.2 there might be a simulation!

Let's now consider an event of a more numerical nature - the land of random variables (henceforth _r.v._).

Say you have a 6-face dice:
* it seems very well balanced - perfectly and symmetrically polished (across all of _main_ axes);
* surprisingly enough, you are concerned with the number of dots on the face that ends up upwards (you could instead be concerned with the amount of times it would hit the floor - would you throw it inconsiderably);
* with enough patience, you can throw it any amount of times you wish, and register the result.

Since _we_ don't actually have a dice, or at least nowhere to throw it together, let's simulate this one with our `python`'s `random` module

1. Just for (re)starts: what probability would you attribute to each outcome of the dice?
2. Say $X$ is the r.v. associated with the outcome of the dice, what's its sample space?
3. What's the probability mass function of $X$?
4. What's $P(X>4)$?
5. What's the cumulative distribution function of $X$?
6. What's $P(X>4|X \text{ is prime})$? (- say your same friend has thrown it; didn't show you yet again; and tells you it's a prime)
7. Say you are to make multiple throws, and $X_1$ represents the outcome of the first one, $X_2$ the outcome of the second, and so on, and so on. Do you have _reason_ to _believe_ $X_1$ is independent of $X_2$?
8. What would then be $P(X_2 = 3|X_1 = 3)$?
9. What about $P(X_2 = 3, X_1 = 3)$?
10. Do you have reason to believe that they - $X_1$ and $X_2$ (or for that matter, $X_3$ and $X_i$) - are _identically distributed_?
11. Say you answered always on the affirmative (- welcome friend!) - use `python` to simulate multiple throws of our dice: make a function that, given a number of throws `n`, returns random sequence of outcomes of `n` throws (check it works) - or, _in other words_, the observed values of $(X_1, ..., X_n)$, _a.k.a_ the sample $(x_1, ..., x_n)$.
12. Use that function to plot the absolute frequencies of each outcome on sequences of 10, 100, and 1000 throws (that's 3 plots).
13. Describe (in english) what you observe on the plots as `n` gets bigger.
14. Does it match what you expected it to be? (- the sweet yes or no)
15. What's the expected value for the distribution associated with $X$?
16. What's the variance? - you may use your `python` calculator to get the _true_ value.
17. Compare the answers of the last 2 questions with the _sample average_ and _sample variance_ (respectively) of some simulated 100 throws.

## 1.3 or a continuous one!

Now, let's consider an event of a more continuous (and still numerical) nature - the land of continuous r.v.'s.

Consider a casual setting for a dystopic (yet rational) dream: 
* there is only one way to travel from Lisbon to Porto: it is a bus that passes once a day;
* the bus is known to departure from Lisbon at any instant of the day;
* furthermore, there is no evidence that favours a specific time-period for departure - it's _uniformly random_.

For ease of computation, say we represent the time of the day that the bus departures with a r.v. $X$ with sample space $[0, 1[$.

1. What's the probability that the bus departs after noon? - or, _in our words_, what's $P(X>0.5)$?
2. What's the probability that the bus departs _exactly_ at 12h00, 0 seconds, 0 nanoseconds (- and 0-on until the very ultimate atomic unit of time, whatever that may be)? - what's $P(X=0.5)$?
3. What's the relation between $P(X>0.5)$ and $P(X\geq 0.5)$?
4. What's the probability that the bus departs between 8h00 and 18h00?
5. What about $P(\frac{1}{3} < X < \frac{3}{4} | X > \frac{1}{2})$? - use both the intuition on the meaning of the mathematical expression, and the product rule, to arrive at the (same!) result.
6. What are the expected value and variance of $X$?
7. Use `python`'s `random` to simulate a sequence of actual departures of the bus on a given year (with 365 days, say).