diff --git a/1-intro/ai.html b/1-intro/ai.html index 947947db..3fd43294 100644 --- a/1-intro/ai.html +++ b/1-intro/ai.html @@ -567,9 +567,9 @@

Artificial Intelligence-Assisted Coding

Artificial Intelligence-Assisted Coding#

-

The world of coding and data science changed dramatically and seismically in 2021, with the public release of large language models (LLMs). These models, which are trained on massive amounts of text data, can be used to generate code that performs a variety of tasks. For example, the GPT-3 model can generate code based on written directions from a user. Whereas in the past, even experienced coders and data scientists relied heavily on web searches, online forums, and documentation to find code that performs a desired task, now they can simply write a description of what they want, and the model will generate code for them. This can a huge time-saver, and allows us as data scientists to instead focus on what we want to do with our data, rather than how to do it.

+

The world of coding and data science has changed dramatically and seismically since 2021, with the public release of large language models (LLMs) specifically trained to help write code. These models, which are trained on massive amounts of text data, can be used to generate code that performs a variety of tasks. For example, the GPT-3 model can generate code based on written directions from a user. Whereas in the past, even experienced coders and data scientists relied heavily on web searches, online forums, and documentation to find code that performs a desired task, now they can simply write a description of what they want, and the model will generate code for them. This can a huge time-saver, and allows us as data scientists to instead focus on what we want to do with our data, rather than how to do it.

In this course, you will learn how to use AI-assisted coding tools to help write code to perform data science tasks — most notably, GitHub Copilot. This has rapidly become the way coders and data scientists work, so it’s not “cheating” — it’s a new tool that can empower you to be more productive. Most importantly, AI assistants will allow us to focus on what we want to do with data, and get us insights faster. These aspects of the course are the most important, but prior to AI tools it took a long time to get there, due to the slow and often tedious process of learning how to code.

-

At the same time, it’s important to recognize that AI-generated code is not perfect, and requires careful evaluation and debugging. LLMs essentially predict the next word in a sentence, based on the previous words — and the same is true when they generate code. They are trained on large samples of existing code that is publicly available (most of which, hopefully, actually works), and do their best to provide you with appropriate code to your prompt. But they are not perfect, and sometimes they will generate code that doesn’t work, or doesn’t actually perform the operation(s) that you intended. The less common a task is, the fewer examples of it the model will have been trained on, and the less likely it is to generate good code for the task. So it’s critical to learn how to evaluate the code that AI tools generate, and make sure it does what you want it to do. This will also be a focus of this course.

+

At the same time, it’s important to recognize that AI-generated code is not perfect, and requires careful evaluation and debugging. LLMs essentially predict the next word in a sentence, based on the previous words — and the same is true when they generate code. They are trained on large samples of existing code that is publicly available (most of which, hopefully, actually works), and do their best to provide you with appropriate code to your prompt. But they are not perfect, and sometimes they will generate code that doesn’t work, or doesn’t actually perform the operation(s) that you intended. The less common a task is, the fewer examples of it the model will have been trained on, and the less likely it is to generate good code for the task. So it’s critical to learn how to evaluate the code that AI tools generate, and make sure it does what you want it to do. This will be a focus of this course.

- + @@ -556,6 +558,14 @@

Connectivism

+
+

Contents

+
+
@@ -567,10 +577,13 @@

Connectivism

Connectivism#

-

Connectivism is a learning theory first introduced in 2005, by two separate academics: Siemens Siemens [Sie18] and Downes Downes [Dow05]. Rooted in constructivism, connectivism is “a learning theory for the digital age” that emphasizes the fact that in the 21st century, much knowledge is externalized from human minds, in the form of the internet. In our hyperconnected world, there is less emphasis on, or need for, individuals to remember specific facts or procedures, because there are huge amounts of information readily accessible when the knowledge is needed. As well, information is increasingly vast, complex, and changing. So learning becomes not just learning and remembering facts, but learning how to use specialized online knowledge bases, and “connect” information between them.

+

Connectivism is a learning theory first introduced in 2005, by two separate academics: Siemens [Sie18] and Downes [Dow05]. Rooted in constructivism, connectivism is “a learning theory for the digital age” that emphasizes the fact that in the 21st century, much knowledge is externalized from human minds, in the form of the internet (and more recently, artificial intelligence, or AI). In our hyperconnected world, there is less emphasis on, or need for, individuals to remember specific facts or procedures, because there are huge amounts of information readily accessible when the knowledge is needed. As well, information is increasingly vast, complex, and changing. So learning becomes not just learning and remembering facts, but learning how to use specialized online knowledge bases, and “connect” information between them.

This is very true in data science. Practitioners rarely know all the details of how to use a particular programming language — the names of every possible command, or how to use them. Instead, data scientists rely on the documentation for these programming languages on the internet. This includes the official documentation, questions and answers posted on help forums such as Stack Exchange, written tutorials, YouTube videos, books, and more. Figuring out how to do something new is virtually a daily occurrence when working in data science, and so the ability to know how to find and evaluate the necessary information — and connect it across sources to solve your problem — is just as important as one’s existing coding skills.

While this knowledge is externalized in digital technology, it is, ultimately, the product of human knowledge and human effort. Thus, like constructivism, connectivism emphasizes the importance of social interaction in learning — but this social interaction may be asynchronous, such as when one person records a YouTube tutorial and someone else watches it months later.

-

Connectivism informs the mindset you should bring to this course. There is little emphasis on memorizing information, except to the extend that knowledge becomes more ingrained as you use it. Instead, the course emphasizes an attitude of continuous improvement and “life hacking”, built on skills of properly understanding and characterizing a problem, doing the appropriate searches to find the necessary information to solve the problem, and then applying that information to deliver the solution. In doing so, you must be a critical evaluator of the information you are finding (since not all information on the internet is created equal). As well, this course encourages and rewards students for contributing to the class knowledge base, through demos, peer teaching, peer assessments, and team projects.

+
+

Connectivism informs the mindset you should bring to this course#

+

There is little emphasis on memorizing information, except to the extend that knowledge becomes more ingrained as you use it. Instead, the course emphasizes an attitude of continuous improvement and “life hacking”, built on skills of properly understanding and characterizing a problem, doing the appropriate searches to find the necessary information to solve the problem, and then applying that information to deliver the solution. In doing so, you must be a critical evaluator of the information you are finding (since not all information on the internet is created equal). As well, this course encourages and rewards students for contributing to the class knowledge base, through demos, peer teaching, peer assessments, and team projects.

+
- + @@ -556,6 +558,16 @@

Start with why

+
+

Contents

+
+
@@ -569,10 +581,19 @@

Start with why

Start with why#

Why are you here, reading this? What do you hope to get out of a course in “neural data science”? These are questions for you to answer for yourself, but I can tell you why I designed this course, and what I hope you will get out of it.

I’ve been involved in psychology and neuroscience research for over 25 years, and from the beginning I recognized that coding skills were highly prized in every lab I worked in, or knew of. And yet, coding was not typically part of the curriculum in program I was in — at best, it was an elective, but more commonly students learned to code on their own, to varying degrees of success and proficiency. In my own case, I learned to code largely by trying to understand code written by others that did things with data that I wanted to d0 — or similar things.

+
+

This is Not CS 101#

Programming courses are usually taught through computer science departments, or faculties, and these are most commonly oriented towards computer science students. But the goals of computer science students and programs are quite different from scientists who want to use code (programming) to understand data. As a result, science students sometimes find what they learn in computer science classes hard to relate to their discipline. At the same time, there is a huge difference between “hacked together” code written by self-taught scientists, and clearly written code that follows best practices of style. Good code is efficient and understandable. And a deeper, more systematic understanding of code leads to code that is more likely to be accurate. At the same time, having proficiency with code empowers you to do things with data that you might not otherwise be able to do.

+
+
+

Neuroscience Needs Data Science#

I realized there was a need for neuroscience and psychology students to learn how to use a programming language to work with data (and a 2021 paper in Nature Neuroscience agrees with me). More fundamentally, I recognized that there was a need for students in these fields to develop greater “fluency” in working with data. In the same way that we develop fluency in language, we can develop a fluency in working with data to organize, summarize, and visualize it — and ultimately, derive meaning from it. This sentiment is captured in this great 3 min video by McGill Neuroscience grad student Emily Irvine. This course aims to address these needs. The course has also been drastically revised as of 2023 to incorporate more machine learning and AI tools, and to focus less on technical aspects of coding, and more on the actual “data science” aspects of the course.

+
+
+

Data Science Skills Have Value Beyond Neuroscience#

Another factor that drove the development of this course was my recognition that the majority of people who pursue undergraduate coursework in neuroscience and psychology don’t end up working as scientists in those fields — even the ones who get PhDs! Indeed, in the USA as many PhDs are working in industry as in academia (Science, 2019), and that’s across all age groups. Estimates of the odds of currently-graduating science PhDs getting a job in academia range from 20-50% UofT, 2018. Coding, and data science, are valuable and highly employable skills that are much more widely applicable than the specific disciplinary training you get working on a particular research topic.

I have experience as a scientist collaborating with companies on research and development, and I teach design thinking, innovation, and entrepreneurship through the SURGE program. These experiences have shown me that data science and critical thinking skills, combined with a background in neuroscience or psychology, are highly valued. I’ve also talked to many recent graduates who found that their lack of coding skills held them back from the most interesting (and lucrative) job opportunities. Almost universally, these opportunities are in the knowledge economy — be that startups, big tech companies, healthcare, government, or other sectors. These fields all rely on people’s abilities to work with data, interpret it, and use it to make decisions. Training in data science will thus both prepare you to work more effectively in psychology and neuroscience, but also provide you with fundamental, cross-cutting skills that you will likely find useful whatever direction your future takes you.

+