[homework: doing, reading, watching] Linear, logistic regressions and embedding visualizations of Zulip data #163

siimre · 2023-05-06T15:19:54Z

Doing

Clean Data Thinking Zulip chat data, located at https://github.com/onefact/datathinking.org-codespace/blob/main/data/datathinking.zulipchat.com/raw/messages-000001.json - put it in a polars dataframe and compute summary statistics of the dataset
Analyze this Zulip chat data using logistic regression, linear regression, and embeddings with the tools we have learned in the lectures (don't forget to ask ChatGPT, Claude, Lex, GPT-4 for help as much as you need, and ask for help on the Data Thinking Zulip chat :)
Create a visualization of logistic regression of the Data Thinking Zulip chat data
Create a visualization of linear regression applied to the Data Thinking Zulip chat data
Create a visualization of embeddings using the Data Thinking Zulip Chat data
Make a copy of the Overleaf template: https://www.overleaf.com/read/ghpyzqwqwxpv (need to create an account and/or sign in if this is your first time using Overleaf). To make a copy, open the project after signing in using this link, and click on Menu, then Copy Project:

In Overleaf, edit the template and figure out how to include a PDF figure in the report, alongside a brief description (a few sentences or paragraphs is fine!) of each of the analyses you performed, why you chose them, and the math equation for the linear regression, logistic regression, and embedding you used.
Add the PDF of the report to this issue as a comment.
Send a message on Zulip with a link to this comment, alongside the image representing your favorite visualization

Reviewing

Review how Jaan got unstuck in the lecture recordings at https://panopto.ut.ee/Panopto/Pages/Sessions/List.aspx?folderID=43bb180c-79a6-4324-b055-afa400ecd1a0
Review collaborative whiteboards from past classes: listed at https://www.datathinking.org/university-of-tartu
Review Jupyter notebooks from past classes:

Reading

Watching

(message Jaan if you need a VPN or these links don't work)

[context for popular media portrayals of chatgpt] South Park S26E04 - "Deep Learning" - https://southpark.cc.com/episodes/8byci4/south-park-deep-learning-season-26-ep-4
[context for how the CEO of OpenAI talks about it] https://abcnews.go.com/Technology/video/openai-ceo-cto-risks-ai-reshape-society-97949497 article
[an emotional history of the world; context for our data thinking practice of feel, ask, do, think -- and how the thoughts in people's heads can shape society] 1 episode of https://thoughtmaybe.com/cant-get-you-out-of-my-head/

The text was updated successfully, but these errors were encountered:

siimre · 2023-06-20T19:43:54Z

HW3_by_SR.pdf

siimre · 2023-06-20T20:45:21Z

Critique from me: The regression results are not correct. Linear regression should be like a line. Logistic regression would look like a hump. All the formulas are nicely laid out and explained. References are correct.

siimre · 2023-06-20T20:52:24Z

Critique from ChatGPT:

Abstract:
The abstract provides a brief overview of the document, mentioning the context of Homework 3 in the DataThinking course and the focus on analyzing a conversation dataset. However, it could benefit from providing more specific details about the dataset and the insights derived.
Introduction:
The introduction briefly introduces the purpose of the study, which is to examine conversation patterns, critical topics, and the relationship between message content, length, and sender identification. It could be improved by providing more context on why these aspects are important and what potential applications or implications they might have.
Data, Cleaning, and Preprocessing:
This section mentions the use of Zulip chat data for analysis and the extraction of message content and sender identification. However, it lacks details about the cleaning and preprocessing steps performed on the data. Providing more information about the specific techniques used would enhance the clarity and reproducibility of the study.
Analyze:
This section briefly mentions the analysis performed on the dataset, including logistic regression, linear regression, and word embeddings. However, it lacks a clear explanation of the specific insights or findings derived from these analyses. Adding more details about the results and their implications would strengthen this section.
Conclusion:
The conclusion provides a summary of the study, highlighting the stages of data gathering, exploratory analysis, model development, and evaluation. However, it lacks specific details about the key findings or insights gained from the analysis. Additionally, it mentions the challenges faced during coding and provides a reference to a helpful guide, but it does not elaborate on the solutions or lessons learned from overcoming those challenges.
References:
The references section includes URLs for the dataset, debugging guide, and code file used. However, it lacks traditional academic referencing style, such as author names, publication dates, and journal/conference references.

Overall, the document provides an outline of the study and mentions the methods used for analysis. However, it lacks in-depth analysis and specific findings. To improve the document, it would be beneficial to include more detailed explanations of the insights gained from the analysis, provide a clearer structure for each section, and adhere to standard academic referencing conventions.

siimre added the homework label May 6, 2023

siimre closed this as completed Jun 20, 2023

siimre mentioned this issue Jun 20, 2023

[homework: asking, writing, thinking, doing, 🔴 red-teaming 😈] Critique what you have built; Real-World Data; Historiography of Data, Incentives, & AI #152

Closed

38 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[homework: doing, reading, watching] Linear, logistic regressions and embedding visualizations of Zulip data #163

[homework: doing, reading, watching] Linear, logistic regressions and embedding visualizations of Zulip data #163

siimre commented May 6, 2023 •

edited

Loading

siimre commented Jun 20, 2023

siimre commented Jun 20, 2023

siimre commented Jun 20, 2023

[homework: doing, reading, watching] Linear, logistic regressions and embedding visualizations of Zulip data #163

[homework: doing, reading, watching] Linear, logistic regressions and embedding visualizations of Zulip data #163

Comments

siimre commented May 6, 2023 • edited Loading

Doing

Reviewing

Reading

Watching

siimre commented Jun 20, 2023

siimre commented Jun 20, 2023

siimre commented Jun 20, 2023

siimre commented May 6, 2023 •

edited

Loading