# Jupyter-first courses: Section 1 (The Why)

## "Don't learn to code—Code to learn"

For years now, we hear calls for “learning to code.” These calls may be well-intentioned and worthwhile, but they miss the point—what we need is _coding to learn_.

[Seymour Papert](https://en.wikipedia.org/wiki/Seymour_Papert), who claimed that “children can learn to program and that learning to program can affect the way they learn everything else,” is the father of _computational thinking_. But what did he mean by this? “It did not occur to me that anyone could possibly take my statement to mean that learning to program would by itself have consequences for how children learn and think. \[…\] Encouraging programming as an activity meant to be good in itself is far removed, in its nature, from working at identifying ideas that have been disempowered and seeking ways to re-empower them. \[…\] Of course, it is harder to think about ideas than to bring a programming language into a classroom.” (Papert, 2000.)

In this first section, I want to unpack the richness of Papertian thought about the role of computational technology, and explain what I mean by “coding to learn.” This discussion represents “the why” behind most of my educational efforts in the past few years.

The national narrative about “learning to code” is often about jobs, or about producing skilled workers for the needs of our technology fueled economy. Salient quotes from prominent people include:<sup>1</sup> 

President Bill Clinton: 
> At a time when people are saying “I want a good job – I got out of college and I couldnt find one,” every single year in America there is a standing demand for 120,000 people who are training in computer science.

Michael Bloomberg, Former Mayor, New York City:
>New York City’s economic future depends on it, and while we’re already giving thousands of our students the opportunity to learn how to code, much more can and should be done.

Arne Duncan, Former Secretary of Education:
>To compete in a global market, our students need high-quality STEM education including computer science skills such as coding.

Sir Richard Branson gets closer to the core of it:
>Teaching young people to code early on can help build skills and confidence and energize the classroom with learning-by-doing opportunities.

Meanwhile, anyone repeating the recurrent theme that simply learning to code will somehow make you better at thinking, overall, or that everyone should learn to think like a computer scientist to get better at problem-solving, is just misinterpreting Papert’s ideas.

To bring the point home, let’s study a deep example, from a 1995 semi-obscure gem of a paper by [Uri Wilensky](https://www.mccormick.northwestern.edu/research-faculty/directory/profiles/wilensky-uri.html), Professor of Learning Sciences and Computer Science at Northwestern University. Wilensky (1995)  examines the case study of a learner faced with a probability paradox. Attacking the paradox computationally, she develops strong intuition about the ideas of randomness and distribution. Concepts such as these are a stumbling block for learning probability, and engaging computationally to “solve” a paradox can motivate learners and drive them to build new mathematical understanding.

It was true in the 1990s, but it’s even more critical today, that learning probability and statistics is essential for all students in natural, physical and social sciences. We could argue that it is necessary knowledge to be an informed citizen in modern society, with statistics mentioned in the media with increasing frequency, and data science penetrating all sectors of the economy. Unfortunately, lack of understanding of probability and statistics is rife, even among professionals. College students are also generally averse to the subject, which is typically presented in highly formal instructional settings (formulas to be memorized). Developing intuition about probability is difficult because, in our everyday lives, we rarely encounter large numbers of repeated trials, a fundamental ingredient of probability. Exploratory computational environments give us access to manipulating large numbers and help develop probabilistic reasoning.

In the Wilensky (1995) case study, a learner will explore the meaning of randomness. She is presented with this seemingly well-defined question: **If you choose a random chord on a given circle, what is the probability that the chord is longer than the circle’s radius?**

It turns out, this question has a hidden ambiguity, and it admits more than one possible answer. The learner in the case study, Ellie, is a professional with good undergraduate background in mathematics. Yet, like virtually anyone else, she will struggle with the meaning of randomness. She begins her exploration of the question by drawing a circle, and tracing a chord that has length equal to the circle’s radius (see the figure below). Then she draws the radii from the chord's endpoints. Looking at this diagram, she sees that she can fit six such equilateral triangles inside the circle. She draws another chord of length equal to the radius, from one endpoint of the first (point **P** in the figure). She develops an argument for the solution: picking point **P**, you can draw a chord from it of length smaller than a radius if you pick the second point on either side within an arc of 60 degrees: one sixth of the circle. Thus, the probability of picking a chord shorter than a radius is $1/3$, and a chord longer than a radius is $2/3$. That's Ellie's answer.


![](../images/Bertrand_chords1.png)
#### A learner's diagram to answer the probability question

Ellie's interviewer poses a different way to look a the problem (see figure below). He says: imagine point **P** is now the midpoint of a chord of length one radius, and we drop a perpendicular line to the center of the circle, **O**. With the radius drawn from one endpoint of the chord, this forms a right triangle, with one side of length $R/2$ and the hypothenuse of length $R$. 

Using Pythagoras, the side **OP** has length $R \sqrt{3/4}$.
Now imagine you draw a circle C2 of that radius: any point inside that circle taken as the midpoint of a chord for C1 will result in a chord longer than $R$. Any point in the annulus between C1 and C2, taken as the midpoint of a chord for C1 will result in a chord shorter than $R$.
We take the probability of choosing a random chord that is longer than $R$ as the ratio of the areas of C2 and C1: $3/4 \pi R^2$ vs $\pi R^2$. So the answer is $3/4$.

![](../images/Bertrand_chords2.png)
#### A diagram of the inteviewer's challenge to Ellie's answer

Faced with the paradox of two different yet reasonable answers to the question, Ellie is uneased. Surely only one can be the _right_ answer. Prompted by the interviewer, Ellie sets about writing a computer program to simulate an experiment and calculate the "correct" probability value. In the process of writing the program, she is uneasy again: she has to choose a method to generate the random chords. She chose an approach consistent with her initial argument, and computed some statistics, after a while getting the value of $2/3$. But she realizes this doesn't settle the question: if she generated the chords using an method matching the interviewer's idea, she might get $3/4$. Which is the correct way of generating the chords?

Translating her reasoning into a computer program shifted Ellie's attention from the task of finding the probability value, to finding the best way to generate random chords. This shift allows her to investigate what is the meaning of "random." She realizes that depending on which experiment she chooses to implement, she will get one or the other answer.

Programming a simulated experiment for the probability question allowed Ellie to face the dilemma of how to define a random chord. She discovered that different computational experiments generate different sets of chords—i.e., different _distributions_ of chord lengths. Realizing there's no unique way of creating random chords gave her authentic insight into the connection between randomness and distribution.

Wilenski (1995) goes on to discuss the difference between the experience of this learner, and another faced with a specialized "black-box" modeling package. With the latter, learners can manipulate parameters in pre-built distributions, but cannot explore the structure underlying the model. Citing evaluations of the stats package's effectiveness, he notes that learners could never answer correctly this test question: "What is it about a variable that makes it a random variable?" 
After hours manipulating distributions, they could still not connect with the concept of "random."
In contrast, learners in Wilenski's study developed strong intuitions by tackling the question of the meaning of randomness by explicitly writing a computer program to investigate it. He concludes:
> This trialogue between ellie's metnal model, the expression of her mental model in encapsulated code and the running of that code, allowed Ellie to successively refine the creative structure of her thought.

## Notes

1. Quotes from: https://codebeedo.com/what-did-they-say-about-coding/


## References

- Papert, S., 2000. What's the big idea? Toward a pedagogy of idea power. IBM systems journal, 39(3.4), pp.720-729.

- Wilensky, U., 1995. Paradox, programming, and learning probability: A case study in a connected mathematics framework. The Journal of Mathematical Behavior, 14(2), pp.253-280, https://doi.org/10.1016/0732-3123(95)90010-1
