## Redundant adjectives

Take a look at the images in Figure 1. How would you describe the circled item in Figure 1(a)? Would you call it "the triangle"? Or "the blue triangle"? How about in Figure 1(b)? Does your answer change?

![](fig-blue-triangle-shapes-1.png)
*(a) The circled triangle is the only triangle.*

![](fig-blue-triangle-shapes-2.png)
*(b) The circled triangle is the only blue triangle.*

*Figure 1: Two sets of four shapes.*

In Figure 1(a) the circled item is the only triangle, but in the bottom image the circled item is one of two triangles. While in Figure 1(a) "the triangle" is a sufficient description for the circled item, many of us might choose to refer to it as the "blue triangle" anyway. In Figure 1(a) there are two triangles, so "the triangle" is no longer sufficient, and to describe the circled item we must qualify it with the color as well, as "the blue triangle".

Your answers to the above questions might be different if you’re answering in a different language than English. For example, in Spanish, the adjective comes after the noun (e.g., "el triángulo azul") therefore the incremental value of the additional adjective might be different for Figure 1(a).

Researchers studying frequent use of redundant adjectives (e.g., referring to a single triangle as "the blue triangle") and incrementality of language processing designed an experiment where they showed the following two images to 22 native English speakers (undergraduates from University College London) and 22 native Spanish speakers (undergraduates from the Universidad de las Islas Baleares). They found that in both languages, the subjects used more redundant color adjectives in denser displays where it would be more efficient ([Rubio-Fernandez, Mollica, and Jara-Ettinger 2021](https://doi.org/10.1037/xge0000963)). One of the displays from the study is shown in Figure 2.

![](redundant-adjectives-blue-triangle.png)

*Figure 2: Images used in one of the experiments described in Rubio-Fernandez, Mollica, and Jara-Ettinger (2021).*

In this case study we will examine data from the redundant adjective study, which the authors have made available on Open Science Framework at [osf.io/9hw68](https://osf.io/9hw68/). The full reference is:

* Rubio-Fernandez, P., F. Mollica, and J. Jara-Ettinger. 2021. "Speakers and Listeners Exploit Word Order for Communicative Efficiency: A Cross-Linguistic Investigation." Journal of Experimental Psychology: General 150 (3): 583–94. https://doi.org/10.1037/xge0000963. 

Let's fire up the modules and load in the data!

In [1]:
# import Numpy library, rename as "np"
import numpy as np
# make random number generator
rng = np.random.default_rng()
# import Pandas library, rename as "pd"
import pandas as pd
# safe setting for Pandas
pd.set_option('mode.copy_on_write', True)

# Set up plotting
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

In [2]:
# read in data
df = pd.read_csv("./data/blue_triangle.csv")

# display first six rows
df.head(6)

Unnamed: 0,language,subject,items,n_questions,redundant_perc
0,English,1,4,10,100
1,English,1,16,10,100
2,English,2,4,10,0
3,English,2,16,10,0
4,English,3,4,10,100
5,English,3,16,10,100


The code output above shows the top six rows of the data. The full dataset has 88 rows. Remember that there are a total of 44 subjects in the study (22 English and 22 Spanish speakers). There are two rows in the dataset for each of the subjects: one representing data from when they were shown an image with 4 items on it and the other with 16 items on it. Each subject was asked 10 questions for each type of image (with a different layout of items on the image for each question). The variable of interest to us is `redundant_perc`, which gives the percentage of questions the subject used a redundant adjective to identify "the blue triangle". Note that the variable is "percentage", and we are interested in the average percentage. Therefore, we will use methods for means. If the variable had been "success or failure" (e.g., "used redundant or didn’t"), we would have used methods for proportions.

## Your task

You are to attempt the following three analyses:

### 1) Exploratory analysis

**Conduct an exploratory analysis**. Use code to summarize the results of the experiment, which you can report in written, tabular, and/or graphical format.

*Hint:* there should be four statistics, showing the percentage of redundant adjective use by language (English vs. Spanish speakers) and by number of items in image (4 objects vs. 16 objects).

### 2) Sparse vs dense displays

**Address the following question: "Do the data provide convincing evidence of a difference in mean redundant adjective usage percentages between sparse (4 item) and dense (16 item) displays for English speakers?”** Note that the English speaking participants were each evaluated on both the 4 item and the 16 item displays. Therefore, the variable of interest is the difference in redundant percentage. Code has been provided below that calculates the difference in redundant percentage.

*Hint:* the [permutation pairs](https://lisds.github.io/textbook/permutation/permutation_pairs.html) code might help with the rest of the question...as this is paired data.

In [3]:
# filter for English speakers
df_english = df[df["language"] == "English"]
# pivot to create wide format dataframe
df_english_wide = df_english.pivot(index = "subject", columns = "items", values = "redundant_perc")
# create new variable diff_redundant_perc
df_english_wide["diff_redundant_perc"] = (df_english_wide[16] - df_english_wide[4])
# inspect new dataframe
df_english_wide.head(6)

items,4,16,diff_redundant_perc
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,100,100,0
2,0,0,0
3,100,100,0
4,10,80,70
5,0,90,90
6,0,70,70


### 3) English vs Spanish speakers

**Address the following question: "How does redundant adjective usage differ between English speakers and Spanish speakers?"** The English speakers are independent from the Spanish speakers, but since the same subjects were shown the two types of displays, we can't combine data from the two display types (4 objects and 16 objects) together while maintaining independence of observations. Therefore, to answer questions about language differences, we will need to conduct two hypothesis tests, one for sparse displays and the other for dense displays.
   
*Hint:* the [population and permutation](https://lisds.github.io/textbook/permutation/population_permutation.html) page, and the subsequent two pages, might help with this question...

*Hint Hint:* you do not need to manipulate the dataframe as in question 2, but you might need to filter for the two display types.