**Analysis of Ranked-Choice Voting Data**

**1. Import Necessary Libraries:**

```python
import pandas as pd
import json
from sqlalchemy import create_engine
import statsmodels.api as sm
```

**Explanation:** We begin by importing the necessary libraries. `pandas` is used for data manipulation, `json` for parsing JSON strings, `sqlalchemy` to connect to the database, and `statsmodels` for regression analysis.

---

**2. Establishing Database Connection:**

```python
engine = create_engine("mysql+pymysql://root:password@localhost/snapshot_database")
```

**Explanation:** We create a connection to the MySQL database using `create_engine`. Adjust the connection string (`"mysql+pymysql://root:password@localhost/snapshot_database"`) to your specific database credentials and name.

---

**3. Reading Data from the Database:**

```python
votes = pd.read_sql('SELECT * FROM votes', con=engine)
proposals = pd.read_sql('SELECT * FROM proposals', con=engine)
```

**Explanation:** We read the tables `votes` and `proposals` from the database into two separate pandas DataFrames.

---

**4. Merging Tables:**

```python
votes_proposals = votes.merge(proposals, left_on='proposal', right_on='id', suffixes=('_vote', '_proposal'))
```

**Explanation:** We merge the `votes` and `proposals` tables based on the proposal IDs. If columns have the same name in both tables, we append the suffixes `_vote` and `_proposal` to distinguish them.

---

**5. Filtering Data for Ranked-Choice Voting:**

```python
votes_ranked = votes_proposals[votes_proposals['type'] == 'ranked-choice'].copy()
```

**Explanation:** We filter the data to retain only the rows where the voting type is 'ranked-choice'.

---

**6. Ranked-Choice Voting Calculation:**

```python
def ranked_choice_voting(choices_list):
    ...
```

**Explanation:** This function implements the ranked-choice voting mechanism. It counts the first choices and checks if any choice has more than 50% of the votes. If not, it eliminates the least popular choice and redistributes its votes to the next available choices of those voters. The process continues iteratively until a choice has the majority of votes or all choices have been considered.

---

**7. Identifying Winning Choices:**

```python
winning_choices_ranked = votes_ranked.groupby('id_proposal').apply(lambda group: ranked_choice_voting(group['choice'].tolist()))
votes_ranked['winning_choice'] = votes_ranked['id_proposal'].map(winning_choices_ranked)
```

**Explanation:** For each proposal, we apply the `ranked_choice_voting` function to determine the winning choice. The winning choices are then mapped back to the main DataFrame.

---

**8. Vote Alignment Calculation:**

```python
votes_ranked['aligned'] = votes_ranked.apply(lambda row: 1 if row['winning_choice'] in json.loads(row['choice'].replace("'", "\"")) else 0, axis=1)
```

**Explanation:** For each vote, we check if the winning choice is present in the voter's ranked choices. If it is, the vote is considered aligned (`1`), otherwise it's not aligned (`0`).

---

**9. Creating Lag Variables and Future Voting Indicator:**

```python
votes_ranked['previous_aligned'] = votes_ranked.groupby(['voter', 'space_vote'])['aligned'].shift()
votes_ranked['misaligned_previous'] = (votes_ranked['previous_aligned'] == 0).astype(int)
votes_ranked['future_voting'] = votes_ranked.groupby(['voter', 'space_vote'])['choice'].shift(-1).notna().astype(int)
```

**Explanation:** We create a lag variable to check if a voter's previous vote was aligned. We then create an indicator for whether the voter's previous vote was misaligned. Lastly, we create an indicator for whether the voter voted in a subsequent proposal within the same DAO.

---

**10. Regression Analysis:**

```python
X = votes_ranked[['misaligned_previous']]
X = sm.add_constant(X)
y = votes_ranked['future_voting']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())
```

**Explanation:** We conduct a regression analysis to examine the relationship between whether a voter's previous vote was misaligned and their likelihood of voting in a future proposal. The results provide coefficients and statistical tests that can be used to interpret this relationship.

---

This analysis allows us to understand how voters behave in a ranked-choice voting system and whether their past voting patterns influence their future voting decisions.

In [15]:
import pandas as pd
import json
from sqlalchemy import create_engine
import statsmodels.api as sm

# Create an engine to the database
engine = create_engine("mysql+pymysql://root:password@localhost/snapshot_database")

# Read in the votes and proposals tables
votes = pd.read_sql('SELECT * FROM votes', con=engine)
proposals = pd.read_sql('SELECT * FROM proposals', con=engine)

# Merge votes and proposals
votes_proposals = votes.merge(proposals, left_on='proposal', right_on='id', suffixes=('_vote', '_proposal'))

# Filter to only ranked-choice voting
votes_ranked = votes_proposals[votes_proposals['type'] == 'ranked-choice'].copy()

def ranked_choice_voting(choices_list):
    first_choice_counts = {}
    
    # Count first choices
    for choice_str in choices_list:
        choice_list = json.loads(choice_str.replace("'", "\""))
        first_choice = choice_list[0]
        first_choice_counts[first_choice] = first_choice_counts.get(first_choice, 0) + 1

    total_votes = sum(first_choice_counts.values())
    MAX_ITERATIONS = len(choices_list)

    for _ in range(MAX_ITERATIONS):
        # Check if any choice has more than 50% of the votes
        for choice, count in first_choice_counts.items():
            if count > total_votes / 2:
                return choice

        # If no choice has majority, eliminate the least popular choice
        least_popular = min(first_choice_counts, key=first_choice_counts.get)
        del first_choice_counts[least_popular]

        # Re-allocate votes of the least popular choice
        for choice_str in choices_list:
            choice_list = json.loads(choice_str.replace("'", "\""))
            if choice_list[0] == least_popular:
                choice_list.pop(0)
                if choice_list:
                    next_choice = choice_list[0]
                    first_choice_counts[next_choice] = first_choice_counts.get(next_choice, 0) + 1

        # If only one choice remains, return it
        if len(first_choice_counts) == 1:
            return list(first_choice_counts.keys())[0]

    print("Maximum iterations reached without a winner. Exiting loop.")
    return None

winning_choices_ranked = votes_ranked.groupby('id_proposal').apply(lambda group: ranked_choice_voting(group['choice'].tolist()))
votes_ranked['winning_choice'] = votes_ranked['id_proposal'].map(winning_choices_ranked)

# Determine if a vote was aligned with the winning choice
votes_ranked['aligned'] = votes_ranked.apply(lambda row: 1 if row['winning_choice'] in json.loads(row['choice'].replace("'", "\"")) else 0, axis=1)

# Create a lag variable for previous alignment
votes_ranked['previous_aligned'] = votes_ranked.groupby(['voter', 'space_vote'])['aligned'].shift()

# Indicate if the previous vote was misaligned
votes_ranked['misaligned_previous'] = (votes_ranked['previous_aligned'] == 0).astype(int)

# Indicate if the voter voted in a subsequent proposal within the same DAO
votes_ranked['future_voting'] = votes_ranked.groupby(['voter', 'space_vote'])['choice'].shift(-1).notna().astype(int)

# Regression analysis
X = votes_ranked[['misaligned_previous']]
X = sm.add_constant(X)  # Adds a constant term to the predictor
y = votes_ranked['future_voting']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loop.
Maximum iterations reached without a winner. Exiting loo

**Dependent Variable (Response)**: `future_voting`
- This variable indicates whether a voter voted in a subsequent proposal within the same DAO.

---

**Model Statistics**:

- **Observations**: 109,180
  - This is the number of votes in the dataset.
  
- **Pseudo R-squared**: 0.0001450
  - This value is a measure of the goodness-of-fit of the model. A value closer to 1 indicates a better fit. In logistic regression, the pseudo R-squared value is often lower than what you might expect from a linear regression model. In this case, the value is very close to 0, suggesting that the model explains only a very small proportion of the variance in the dependent variable.
  
- **Log-Likelihood**: -75,199
  - This value represents the log of the likelihood function at its maximum. It can be used for model comparison, where higher values are better.

- **LL-Null**: -75,210
  - This is the log-likelihood of a model with no predictors (just an intercept). The difference between this value and the log-likelihood of the model gives an idea of how much the predictors improved the model.

- **LLR p-value**: 3.017e-06
  - The likelihood ratio test (LLR) p-value tests the null hypothesis that all coefficients (except the constant) are equal to zero. A small p-value indicates that the model is statistically significant at explaining some of the variance in the dependent variable when compared to a null model.

---

**Coefficients**:

1. **const (Intercept)**
   - **Coefficient**: 0.1891
     - This is the log-odds of a voter voting in a future proposal if they had an aligned previous vote.
   - **P>|z|**: 0.000
     - This p-value suggests that the intercept is statistically significant.

2. **misaligned_previous**
   - **Coefficient**: -0.2243
     - This value represents the change in the log-odds of a voter voting in a future proposal for each unit increase in the `misaligned_previous` variable, holding other variables constant. The negative sign indicates that if a voter had a misaligned vote in the past, their likelihood of voting in a future proposal decreases.
   - **P>|z|**: 0.000
     - This p-value is less than 0.05, indicating that the variable `misaligned_previous` is statistically significant in predicting the dependent variable.

---

**Interpretation**:

The regression results suggest that having a misaligned vote in a previous proposal reduces the likelihood of a voter participating in a future proposal within the same DAO. Specifically, the log-odds of voting in a future proposal decrease by 0.2243 units for voters who had a misaligned vote in the past, compared to those who had aligned votes, holding all else constant.

It's essential to note that while the results are statistically significant, the Pseudo R-squared value indicates that the model explains only a small portion of the variance in the dependent variable. This suggests that other factors, not included in the model, also influence whether a voter will participate in future proposals.