Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[homework: doing, reading, watching] Linear, logistic regressions and embedding visualizations of Zulip data #163

Closed
27 tasks done
siimre opened this issue May 6, 2023 · 3 comments
Labels

Comments

@siimre
Copy link

siimre commented May 6, 2023

Doing

  • Clean Data Thinking Zulip chat data, located at https://github.com/onefact/datathinking.org-codespace/blob/main/data/datathinking.zulipchat.com/raw/messages-000001.json - put it in a polars dataframe and compute summary statistics of the dataset
  • Analyze this Zulip chat data using logistic regression, linear regression, and embeddings with the tools we have learned in the lectures (don't forget to ask ChatGPT, Claude, Lex, GPT-4 for help as much as you need, and ask for help on the Data Thinking Zulip chat :)
  • Create a visualization of logistic regression of the Data Thinking Zulip chat data
  • Create a visualization of linear regression applied to the Data Thinking Zulip chat data
  • Create a visualization of embeddings using the Data Thinking Zulip Chat data
  • Make a copy of the Overleaf template: https://www.overleaf.com/read/ghpyzqwqwxpv (need to create an account and/or sign in if this is your first time using Overleaf). To make a copy, open the project after signing in using this link, and click on Menu, then Copy Project:

image

  • In Overleaf, edit the template and figure out how to include a PDF figure in the report, alongside a brief description (a few sentences or paragraphs is fine!) of each of the analyses you performed, why you chose them, and the math equation for the linear regression, logistic regression, and embedding you used.
  • Add the PDF of the report to this issue as a comment.
  • Send a message on Zulip with a link to this comment, alongside the image representing your favorite visualization

Reviewing

Reading

Watching

(message Jaan if you need a VPN or these links don't work)

@siimre siimre added the homework label May 6, 2023
@siimre
Copy link
Author

siimre commented Jun 20, 2023

HW3_by_SR.pdf

@siimre siimre closed this as completed Jun 20, 2023
@siimre
Copy link
Author

siimre commented Jun 20, 2023

Critique from me: The regression results are not correct. Linear regression should be like a line. Logistic regression would look like a hump. All the formulas are nicely laid out and explained. References are correct.

@siimre
Copy link
Author

siimre commented Jun 20, 2023

Critique from ChatGPT:

  1. Abstract:
    The abstract provides a brief overview of the document, mentioning the context of Homework 3 in the DataThinking course and the focus on analyzing a conversation dataset. However, it could benefit from providing more specific details about the dataset and the insights derived.

  2. Introduction:
    The introduction briefly introduces the purpose of the study, which is to examine conversation patterns, critical topics, and the relationship between message content, length, and sender identification. It could be improved by providing more context on why these aspects are important and what potential applications or implications they might have.

  3. Data, Cleaning, and Preprocessing:
    This section mentions the use of Zulip chat data for analysis and the extraction of message content and sender identification. However, it lacks details about the cleaning and preprocessing steps performed on the data. Providing more information about the specific techniques used would enhance the clarity and reproducibility of the study.

  4. Analyze:
    This section briefly mentions the analysis performed on the dataset, including logistic regression, linear regression, and word embeddings. However, it lacks a clear explanation of the specific insights or findings derived from these analyses. Adding more details about the results and their implications would strengthen this section.

  5. Conclusion:
    The conclusion provides a summary of the study, highlighting the stages of data gathering, exploratory analysis, model development, and evaluation. However, it lacks specific details about the key findings or insights gained from the analysis. Additionally, it mentions the challenges faced during coding and provides a reference to a helpful guide, but it does not elaborate on the solutions or lessons learned from overcoming those challenges.

  6. References:
    The references section includes URLs for the dataset, debugging guide, and code file used. However, it lacks traditional academic referencing style, such as author names, publication dates, and journal/conference references.

Overall, the document provides an outline of the study and mentions the methods used for analysis. However, it lacks in-depth analysis and specific findings. To improve the document, it would be beneficial to include more detailed explanations of the insights gained from the analysis, provide a clearer structure for each section, and adhere to standard academic referencing conventions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant