# HCDE 410 Final

## Overview 
AI applications are becoming prevalent in every aspect of our lives, professionally and personally. In the exploration of 
how to foster trust in these applications as they execute a variety of complex tasks, I have come across the key understanding that our applications potentially create further disparity in people's ability to accomplish goals. Every AI application needs to be interacted with in Natural Language at a high level to effectively tap into its value, yet not every adult in the USA is literate enough to even do so. This poses an issue as there is a significant technology-enabled advantage that comes with AI, however, not every person  or user is capable of using AI to its best ability because of the technological friction that comes with language. From a human-centered perspective, analyzing the literacy of Adults in the US provides us insight into the demographics, types of literacy gaps, and potential disabilities the population may host as a whole. Empathizing with these situations is the approach needed to design AI applications that overcome the gaps instead of creating a larger disparity between people. I hope to learn the specific demographics and types of individuals at a literacy rate that we can deem invaluable to use existing AI applications and leverage those insights to determine the gap between existing AI application users and users unable to use those platforms. This is a qualitative and quantitative analysis of literacy in the United States and its ability to help in AI applications.

This project will explore the literacy rates of adults in the US to answer the question: What demographic and socioeconomic factors are most strongly associated with low literacy rates among U.S. adults, and how might these factors predict disparities in the ability to effectively utilize AI applications requiring natural language interactions?

### Problem Statement
As AI continues to shape our personal and professional lives, its reliance on natural language interaction poses a significant challenge for individuals with low literacy skills. Socioeconomic factors like education, income, and age are often direct determinants of literacy levels, influencing access to resources and opportunities for skill development. This disparity in literacy creates a technological gap, where individuals unable to effectively interact with AI are left at a disadvantage. By exploring the correlation between socioeconomic factors and literacy rates, we can uncover key insights into the barriers that prevent certain groups from leveraging AI to its fullest potential. Understanding these relationships is critical to designing AI systems that are inclusive and bridge existing divides, rather than widening them.
### Hypothesis
If adults have lower levels of education, are older, or face financial struggles, then they are more likely to have lower literacy skills, because these factors are often linked to limited access to resources and opportunities for skill development. This lack of literacy may create barriers to effectively using AI tools that rely on natural language interactions.

## Background & Research
Research in adult literacy in the U.S. has shown that a large portion of the population lacks the necessary skills to perform tasks that require complex reading comprehension. The National Assessment of Adult Literacy (NAAL) has found that nearly 50% of U.S. adults have low literacy levels, meaning they can only understand simple texts and instructions, but struggle with more intricate language processing tasks (National Center for Education Statistics, 2003). This is particularly significant when considering that many AI platforms require users to interact through written prompts, making these systems potentially inaccessible to people who lack good literacy skills.

Moreover, research has shown that literacy gaps are not just limited to reading comprehension but also extend to digital literacy. A study by the Pew Research Center (2019) highlights that while a majority of Americans have access to digital technology, there is still a notable divide in how well individuals can use these technologies. Digital literacy has a strong correlation with traditional literacy levels, and people with low literacy skills are more likely to struggle with technology-based tasks, including using AI applications.

Previous studies on the usability of AI applications have also explored how the design of these systems can either facilitate or hinder interactions with users from diverse literacy backgrounds. Jakob Nielsen, in his article "Prompt-driven AI UX Hurts Usability" (2019), emphasizes the challenges posed by text-heavy AI interfaces, which often assume users are proficient in reading and interpreting complex prompts. Nielsen argues that these designs can create usability frictions for people with lower literacy skills, as they require users to navigate AI systems by inputting detailed, language-based queries. The complexity of natural language processing in AI systems often makes them less intuitive for individuals who are not familiar with advanced language constructs or unfamiliar terminology, even if they have basic literacy.

Studies such as those by Shneiderman (1994) on “Universal Usability” advocate for the development of interfaces that are designed to be inclusive of all users, regardless of their literacy or cognitive abilities. The idea is to make technology more accessible by employing clear, simple language and alternative methods of interaction, such as voice commands or visual cues. This body of work has contributed to the understanding that AI applications can be designed in ways that reduce friction for users with varying levels of literacy.

The findings of these studies provide the foundation for this project. By investigating the literacy rates of adults in the U.S., particularly in the context of the specific literacy required to effectively interact with AI applications, this study aims to identify which demographics are most affected by these gaps. Previous research on the usability of AI systems and their reliance on high levels of literacy informs the decision to focus on this issue, as it highlights the need for AI interfaces that cater to a broader spectrum of literacy skills.

References:
National Center for Education Statistics (2003). National Assessment of Adult Literacy. https://nces.ed.gov/naal/
Pew Research Center (2019). Americans and Digital Knowledge. https://www.pewresearch.org/internet/2019/10/09/americans-and-digital-knowledge/#:~:text=A%20new%20Pew%20Research%20Center,other%20items%20are%20more%20challenging.
Nielsen, J. (2019). Prompt-driven AI UX Hurts Usability. https://www.linkedin.com/pulse/prompt-driven-ai-ux-hurts-usability-jakob-nielsen/
Shneiderman, B. (2010). Universal Usability. Oxford University Press. https://dl.acm.org/doi/pdf/10.1145/332833.332843

## Data Source 
The dataset for this study is the Depression Student Dataset, sourced from Kaggle. It includes 502 entries with 11 features that cover a range of factors like academic pressure, study satisfaction, sleep duration, financial stress, and depression status. This dataset is a great fit for our research because it focuses on the kinds of variables that directly tie into academic and mental health experiences.

What makes this dataset useful is its mix of categorical variables (like gender and family history of mental illness) and numerical ones (like academic pressure and study hours). This balance gives us the flexibility to explore correlations and relationships from multiple angles. Plus, being open-source means the data is transparent and reproducible, which is key for building credible insights.

However, there are some concerns we need to keep in mind. The dataset is relatively small with just 502 entries, so while it’s good for identifying trends, it might not be fully representative of the broader student population. Additionally, we don’t have detailed context about how the data was collected, which could introduce biases or limit how broadly we can apply our findings. These are important factors to consider as we analyze and interpret the results.
                                                                                                      
Link: https://www.kaggle.com/datasets/ikynahidwin/depression-student-dataset
License: Creative Commons CCZero License (Publicly available data/license)


## Methods
I will use logistic regression to model the likelihood of individuals being able to effectively use AI applications based on their literacy scores and demographic factors (e.g., age, education, race). Additionally, I will apply chi-square tests to assess the relationships between categorical variables, such as age group and literacy level.

I will present the results using bar charts to visualize literacy levels across different demographics, and tables to summarize key relationships. For logistic regression results, I will provide a table with coefficients and odds ratios to demonstrate the relationship between literacy levels and the likelihood of effectively using AI systems.

These methods are appropriate because they allow for a clear understanding of how demographic factors and literacy levels influence the ability to interact with AI systems. The logistic regression model is especially suitable for examining binary outcomes, such as whether an individual has the literacy skills to use AI applications effectively. The visualizations will help present the findings in a digestible format for diverse audiences.

## Unknowns to keep in Mind
While the Depression Student Dataset provides valuable insights into student mental health, there are several potential biases and unknowns that we need to consider. The dataset includes 502 entries, but it’s unclear where or how the data was collected, raising questions about its representativeness. Factors like geography, cultural differences, or varying educational systems might influence the relationships we’re studying but may not be captured here. Additionally, many variables, such as depression status or study satisfaction, are likely self-reported, which introduces the possibility of inaccuracies due to personal perceptions or reluctance to disclose true feelings. The dataset also lacks information about the timing and conditions under which the data was collected; for example, if it was gathered during exams or a disruptive period like the COVID-19 pandemic, the results might not reflect typical academic experiences. Furthermore, variables like "sleep duration" or "dietary habits" are grouped into broad categories, which could mask important nuances. By keeping these limitations in mind, we can interpret the results carefully and provide insights that reflect the strengths and constraints of the data.