# Introduction to CSS 100

## Goals of this lecture

- Quick introductions.
- Overview of class content. 
  - Brief tour of CSS 100.
- Course logistics.

## Who are we?

**Teaching Team**:

- [Sean Trott](https://seantrott.github.io/): Assistant Teaching Professor in Cognitive Science and CSS. 
   - Research: Large language models (LLMs) and human language comprehension.
- TAs: Muhammad Karim.

## Course Overview

### What is CSS?

In a nutshell, [Computational Social Science](https://en.wikipedia.org/wiki/Computational_social_science) focuses on **computational approaches** to **social science**.

At UCSD, [Social Sciences](https://socialsciences.ucsd.edu/) encompasses many disciplines:

- Psychology.  
- Economics.
- Political Science.
- Cognitive Science. 
- Urban Studies and Planning.  

And [many more](https://socialsciences.ucsd.edu/about/org-chart.html)!

### CSS 100 in the UCSD CSS Ecosystem

**CSS 100** is one of the [three core courses for the CSS undergraduate minor](https://css.ucsd.edu/undergraduate-minor/index.html).

- CSS 1: Introduction to Python programming (and a little bit of data science).  
- CSS 2: Ethics, wrangling, visualizing, and basic modeling of social science data. 
- CSS 100: Advanced programming for CSS.

CSS is very *broad*: so what should "advanced programming" cover?

### CSS 100: Various perspectives

> **CSS 100** equips students with *advanced techniques* for processing, modeling, and analyzing social science data of various kinds.

- Building on *CSS 1*.
   - "Leveing up" programming skills.
- Building on *CSS 2*.
   - *Supervised* and *unsupervised* machine learning techniques.  
   - Techniques for *inferential statistics* (e.g., **resampling**). 
- New content!
   - [Natural Language Processing (NLP)](https://en.wikipedia.org/wiki/Natural_language_processing): using text as data!
   - [Deep learning](https://en.wikipedia.org/wiki/Deep_learning) and [Large Language Models](https://en.wikipedia.org/wiki/Large_language_model). 

### Building on CSS 1: Leveling up programming

CSS 1 (or equivalent) builds **foundational Python programming skills**.

- *Control flow*: `for` loops, `if/else` statements.  
- *Functions*: default arguments, etc. 
- *Collections*: `list`s, `dict`ionaries, and more.  

CSS 100 will *strengthen* that foundation, introduce [object-oriented programming (OOP)](https://en.wikipedia.org/wiki/Object-oriented_programming), and discuss [`Exceptions`](https://docs.python.org/3/tutorial/errors.html). 

### From functions to classes

> **Object-oriented programming** is a programming paradigm based on *objects* (i.e., `class`es), which contain associated *data* and *functions*.

In [1]:
class BankAccount():
    def __init__(self):
        self.amount = 0
    
    def deposit(self, x):
        self.amount += x

In [2]:
my_accuont = BankAccount()
print(my_accuont.amount)

0


In [3]:
my_accuont.deposit(10)
print(my_accuont.amount)

10


### Building on CSS 2: Modeling your data

CSS 2 (or equivalent) builds **skills for data wrangling, visualization, and analysis**.  

- Importing and wrangling `.csv` files with `pandas`.   
- Visualizing data using `seaborn` and `matplotlib`.  
- Building regression models with `statsmodels`.  
- Cross-validation and clustering with `scikit-learn`.  

CSS 100 will introduce new **supervised** and **unsupervised** techniques for modeling your data.

### Advanced machine learning techniques

> **sSpervised learning** involves mapping from features $X$ to labeled data $Y$.

- Advanced regression techniqes (**regularization**).  
- Beyond regression: **random forests**, **support vector machines (SVM)**.

> **Unsupervised learning** involves analyzing or modeling unlabeled data $X$.

- Extracting structure with **hierarchical clustering**.
- Dimensionality reduction with **principal components analysis**.  

### New content!

We will also introduce several *new* topics.

- Techniques for [Natural Language Processing (NLP)](https://en.wikipedia.org/wiki/Natural_language_processing): using text as data.
   - Tokenization and sentiment analysis with `nltk`.  
   - Parsing and embeddings with `spaCy`.  
- [Deep learning](https://en.wikipedia.org/wiki/Deep_learning).
   - Basics of *neural networks*.  
   - Building a simple neural network in Python.  
- [Large Language Models](https://en.wikipedia.org/wiki/Large_language_model). 
   - LLMs: architectures, approaches, and background.
   - Introduction to `transformers`, a popular Python package for using LLMs.

## Course Logistics

## Course Structure

Class time is divided into *lecture* and *section*.

- Lecture is a time to **introduce**, **explain**, and **demonstrate** new concepts.  
  - There will be a focus on **hands-on practice** (i.e., "check-ins"). 
- Section is a time to **practice** and **develop further fluency** with these concepts.  


### Following along in lecture

- Lecture will have many opportunities to **follow along** via **check-ins**.  
- I do recommend doing this, whether you're in-person or watching the podcast! 
- The lectures can all be found on GitHub, and downloaded or **cloned** into your DataHub account: 
   - Link: https://github.com/seantrott/css100_lectures
   - Will be updated throughout quarter.

### Grading and Assessments

- Most weeks will have a **coding lab** due the following week.
- There are also **four problem sets**, which will be auto-graded.
- There is also a **final exam**–: half-Canvas, half-DataHub (all online).

| Grade Component | Percentage of Final Grade |
| --------------- | ------------------------- |
| 8 Coding Labs | 50% (6.25% each) |
| 4 Problem Sets | 32% (8% each) |
| 1 Final Exam| 18% |

### Expectations

- Course will involve programming in Python.  
  - We will review basics, but expectations include [CSS 1 content](https://ucsd-css2.github.io/ucsd-css2-website/course/expectations.html) and [CSS 2 content](https://ucsd-css2.github.io/ucsd-css2-website/course/syllabus.html). 
  - Lab 1 will include some review of data wrangling, etc.
- Will also involve using **DataHub** (and Jupyter notebooks).
- Lecture/section attendance not required.  

### Academic Integrity

From the syllabus:

> Please turn in your own work. While you are encouraged to work together on some assignments (e.g., on [labs](../labs/overview.md)), you should still understand the code you've submitted. Problem sets and final project should be completed independently.

> Please review academic integrity policies [here](http://academicintegrity.ucsd.edu). Cheating and plagiarism are unfair to other students and ultimately to yourself, and you will be penalized if caught. Instead, if you're struggling with something, please come to office hours and ask for help! 


### Note on course modality

- CSS 100 is an in-person course, although lectures will be *podcasted*.  


## Welcome to CSS!