<center> <h3> Do experts overrate the extent of their expertise? </h3></center>  
This lab activity uses the open data from Study 1b of Atir, Rosenzweig, & Dunning (2015) to teach multiple
regression. Results of the activity provided below should *exactly* reproduce the results described in the paper.

**CITATION**  
Atir, S., Rosenzweig, E., & Dunning, D. (2015). When knowledge knows no bounds: Self-perceived
expertise predicts claims of impossible knowledge. Psychological Science, 26, 1295-1303.

**LEARNING OBJECTIVES**  
* Calculate descriptive statistics.
* Conduct multiple regression analyses.
* Conduct t-tests

**STUDY DESCRIPTION**  
Valuing expertise is important for modern life. When people have a problem, they need to know who to
turn to for a solution to their problem. For example, when people get sick, they know that a doctor is an
expert in the field of medicine and can help them get better. In general, experts simply know more about
a topic than do non-experts. However, experts may be vulnerable to a particular problem of knowing so
much. They may have the illusion that they know more about a topic than they actually do.

This particular type of overconfidence is called *overclaiming*. Essentially, overclaiming occurs when people
claim that they know something that is impossible to know, such as claiming to know the capital of
Sharambia (a country that doesn’t actually exist).

To test if experts are susceptible to overclaiming, Atir, Rosenzweig, and Dunning (2015) recruited 202
individuals from an online participant pool. They first asked participants to complete either a measure of
self-perceived knowledge, or an overclaiming task (to test for a possible order effect, half of the
participants completed the measure of perceived knowledge first, whereas the other half completed the
overclaiming task first). The self-perceived knowledge questionnaire asked people to indicate their level
of knowledge in the area of personal finance. The overclaiming task asked participants to indicate how
much they knew about 15 terms related to personal finance (e.g., home equity). Included in the 15 items
were three terms that do not actually exist (e.g., annualized credit). Thus, overclaiming occurred when
participants said that they were knowledgeable about the non-existent terms. Finally, participants
completed a test of financial literacy called the FINRA. Whereas the earlier questionnaires measured
self-perceived knowledge, the FINRA measured actual knowledge.

**Analyses**

1. Open the data file (called Atir Rosenzweig Dunning 2015 Study 1b).

In [None]:
# Load libraries
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from scipy import stats

# Uncomment and Set directories if you are using dataset from your own local machine
# data_dir = '<insert path to directory where dataset is located>'
# os.chdir(data_dir)


# Load dataset into dataframe
file_id = '0Bz-rhZ21ShvOTnM5YmJQOHpZNzA'
resource_key = '0-d12XfcSZxwDJc-PKPwC0rQ'

# Construct a direct download link
direct_link = f'https://drive.google.com/uc?export=download&id={file_id}&resourcekey={resource_key}'
df = pd.read_csv(direct_link) 

print(df.head())

2. First, calculate means and standard deviations for overclaiming.

In [None]:
# NumPy provides functions for numerical operations like mean and standard deviation
overclaiming_mean = np.mean(df['column_name'])  # Replace 'column_name' with column you're looking for
print(f'Mean of overclaiming: {overclaiming_mean}')

# use the docs (https://numpy.org/doc/stable/user/absolute_beginners.html) to find the standard deviation function 
# uncomment and complete the lines below with the right function
# overclaiming_std = 
# print(f'Standard Deviation of overclaiming: {overclaiming_std}')

3. You next want to examine the relationship between self-perceived knowledge and overclaiming. You
also want to take into account the accuracy with which participants responded during the overclaiming
task (that is the ability of people to distinguish between the 12 real terms and the 3 fake terms). Conduct
an analysis that uses both self-perceived knowledge and accuracy to predict overclaiming.

In [None]:
# statsmodels for regression analysis
X = df[['predictor', 'another predictor']]  # Replace with your independent variables
y = df['outcome variable']  # Replace with your dependent variable 
X = sm.add_constant(X)  # Adds a constant term to the predictor

# fit the model
model = sm.OLS(y, X).fit()
predictions = model.predict(X)

# Print out the results
print(model.summary())


4. You next want to determine whether there is an order effect (based on whether participants
completed the self-perceived knowledge measure first, or the overclaiming task first. Compare the mean
level of overclaiming based on the order of the tasks.

In [None]:
# which test should you ues to compare means between two groups?

5. If you found a significant difference in overclaiming in the analysis above (#4), re-perform the analysis
from #3 to check to see if the relationship between self-perceived knowledge and overclaiming changes,
when taking into account the order of the tasks.

In [None]:
# another regression analysis
# predictors: self-perceived knowledge
# controls: order of tasks
# outcome: overclaiming


6. You next want to determine if the self-perceived knowledge still predicts overclaiming while
accounting for the variance due to genuine expertise, as measured by the FINRA. First, find the mean and
standard deviation for scores on the FINRA. Then, re-perform the analysis from #3, but this time include
scores on the FINRA as an additional predictor variable.

In [None]:
# calculate mean and standard deviation as you did above


# run regression analysis as you did above


7. Prepare an APA-style results section for the analyses you completed.

In [None]:
# Extract key statistics from the fitted models
intercept, slope1, slope2 = model.params
t_value1 = model.tvalues['predictor1']
t_value2 = model.tvalues['predictor2']
p_value_const = model.pvalues['const']
p_value1 = model.pvalues['predictor1']
p_value2 = model.pvalues['predictor2']
r_squared = model.rsquared
f_value = model.fvalue
df_model = int(model.df_model)
df_resid = int(model.df_resid)

# APA-style formatted output for linear regression
apa_report = (
    f"A simple linear regression was conducted to predict y from X1 and X2. "
    f"The regression equation was significant, F({df_model}, {df_resid}) = {f_value:.2f}, p = {p_value_const:.3f}, "
    f"with an R² of {r_squared:.2f}. "
    f"The slope for X1 was {slope1:.2f} (SE = {model.bse['predictor1']:.2f}), t({df_resid}) = {t_value1:.2f}, p = {p_value1:.3f}. "
    f"The slope for X2 was {slope2:.2f} (SE = {model.bse['predictor2']:.2f}), t({df_resid}) = {t_value2:.2f}, p = {p_value2:.3f}."
)

# You've formatted other tests for APA reports. Do that again here if you ran other tests.