## Analyzing Quadratic Voting Patterns in Proposals

### Setting up the Environment

```python
import pandas as pd
import json
from sqlalchemy import create_engine
import statsmodels.api as sm
```

We start by importing necessary libraries. `pandas` for data manipulation, `json` for parsing JSON formatted strings, `sqlalchemy` for database connections, and `statsmodels` for regression analysis.

### Connecting to the Database

```python
engine = create_engine("mysql+pymysql://root:password@localhost/snapshot_database")
```

Using SQLAlchemy, an engine is created to connect to the `snapshot_database` MySQL database.

### Retrieving Data

```python
votes = pd.read_sql('SELECT * FROM votes', con=engine)
proposals = pd.read_sql('SELECT * FROM proposals', con=engine)
```

The votes and proposals tables from the database are loaded into pandas DataFrames.

### Merging Data

```python
votes_proposals = votes.merge(proposals, left_on='proposal', right_on='id', suffixes=('_vote', '_proposal'))
```

Votes and proposals are merged on their respective IDs, with suffixes added to differentiate columns that have the same name in both tables.

### Filtering for Quadratic Voting

```python
votes_quadratic = votes_proposals[votes_proposals['type'] == 'quadratic'].copy()
```

We filter for entries where the type of voting is 'quadratic'.

### Defining the Quadratic Winning Choice Function

The `get_quadratic_winning_choice` function calculates the winning choice for a given set of votes, considering the weights associated with each choice.

- If the choice is provided as a dictionary, it takes the square root of each weight and aggregates them.
- If the choice is given as a list, it assumes a weight of 1 for each item in the list.
- Any decoding errors or unexpected data formats are printed for debugging purposes.

### Applying the Function

```python
winning_choices_quadratic = votes_quadratic.groupby('id_proposal').apply(lambda group: get_quadratic_winning_choice(group['choice'].tolist(), group['scores'].tolist()))
votes_quadratic['winning_choice_updated'] = votes_quadratic['id_proposal'].map(winning_choices_quadratic)
```

For each proposal, the function is applied to find the winning choice. This winning choice is then added as a new column in the `votes_quadratic` DataFrame.

### Further Analysis

1. Determine if a user's vote was aligned with the winning choice.
2. Check if the user's previous vote was aligned.
3. Check if the user voted in a subsequent proposal within the same DAO.
4. Run a logistic regression to see if users who misaligned their previous vote are less likely to vote in the future.

### Regression Analysis

```python
X = votes_quadratic[['misaligned_previous']]
X = sm.add_constant(X)  
y = votes_quadratic['future_voting']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())
```

A logistic regression model is fit to predict the likelihood of a user voting in a subsequent proposal based on whether their previous vote was misaligned with the winning choice. The results are printed, providing coefficients, p-values, and other statistics.

In [1]:
import pandas as pd
import json
from sqlalchemy import create_engine
import statsmodels.api as sm

# Create an engine to the database
engine = create_engine("mysql+pymysql://root:password@localhost/snapshot_database")

# Read in the votes and proposals tables
votes = pd.read_sql('SELECT * FROM votes', con=engine)
proposals = pd.read_sql('SELECT * FROM proposals', con=engine)

# Merge votes and proposals
votes_proposals = votes.merge(proposals, left_on='proposal', right_on='id', suffixes=('_vote', '_proposal'))

# Filter to only quadratic voting
votes_quadratic = votes_proposals[votes_proposals['type'] == 'quadratic'].copy()

def get_quadratic_winning_choice(choices, scores):
    total_weights = {}
    for idx, choice_str in enumerate(choices):
        try:
            # Try to load the choice as a dictionary
            choice_dict = json.loads(choice_str.replace("'", "\""))
            if isinstance(choice_dict, dict):
                for choice, weight in choice_dict.items():
                    # Ignore negative weights and ensure weight is real (non-complex)
                    if weight >= 0 and isinstance(weight, (int, float)):
                        total_weights[choice] = total_weights.get(choice, 0) + (weight**0.5)
            elif isinstance(choice_dict, list):
                # If choice is a list, add 1 to the weight for each choice in the list
                for choice in choice_dict:
                    total_weights[choice] = total_weights.get(choice, 0) + 1
        except json.JSONDecodeError:
            # If there's a decoding error, print the choice string for debugging
            print(f"Unexpected data format for choice: {choice_str}")
            continue
    
    # Ensure total_weights isn't empty before finding max
    if not total_weights:
        return None
    return max(total_weights, key=total_weights.get)

# Apply the function to get winning choices for each proposal
winning_choices_quadratic = votes_quadratic.groupby('id_proposal').apply(lambda group: get_quadratic_winning_choice(group['choice'].tolist(), group['scores'].tolist()))
votes_quadratic['winning_choice_updated'] = votes_quadratic['id_proposal'].map(winning_choices_quadratic)


# Determine if a vote was aligned with the winning choice
votes_quadratic['aligned_updated'] = votes_quadratic.apply(lambda row: 1 if row['winning_choice_updated'] in json.loads(row['choice'].replace("'", "\"")) else 0, axis=1)

# Create a lag variable for previous alignment
votes_quadratic['previous_aligned'] = votes_quadratic.groupby(['voter', 'space_vote'])['aligned_updated'].shift()

# Indicate if the previous vote was misaligned
votes_quadratic['misaligned_previous'] = (votes_quadratic['previous_aligned'] == 0).astype(int)

# Indicate if the voter voted in a subsequent proposal within the same DAO
votes_quadratic['future_voting'] = votes_quadratic.groupby(['voter', 'space_vote'])['choice'].shift(-1).notna().astype(int)

# Regression analysis
X = votes_quadratic[['misaligned_previous']]
X = sm.add_constant(X)  # Adds a constant term to the predictor
y = votes_quadratic['future_voting']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())


Optimization terminated successfully.
         Current function value: 0.682254
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:          future_voting   No. Observations:                37127
Model:                          Logit   Df Residuals:                    37125
Method:                           MLE   Df Model:                            1
Date:                Wed, 06 Sep 2023   Pseudo R-squ.:                0.008228
Time:                        11:48:27   Log-Likelihood:                -25330.
converged:                       True   LL-Null:                       -25540.
Covariance Type:            nonrobust   LLR p-value:                 2.135e-93
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                   0.1247      0.011     11.196      0.000       0.103       0.147
mi

## Logistic Regression Analysis on Future Voting Behavior

### Model Overview:

- **Dependent Variable (DV)**: `future_voting`
    - This is what you're trying to predict. It seems to be a binary variable indicating whether a voter voted in a subsequent proposal (1 for Yes, 0 for No).
  
- **Independent Variable (IV)**: `misaligned_previous`
    - This is your predictor or explanatory variable. It appears to be a binary variable indicating whether a voter's previous vote was misaligned with the winning choice (1 for misaligned, 0 for aligned).

### Key Results:

1. **Pseudo R-squared**: 0.008228
    - This value indicates the proportion of the variance in the dependent variable that's explained by the independent variables. In logistic regression, this isn't interpreted in the same way as the R-squared in linear regression. It's a low value, suggesting that the model explains a small portion of the variance in future voting behavior.

2. **Log-Likelihood**: -25330
    - This is the log of the likelihood function value for the estimated model. It's used for model comparison. A model with a higher log-likelihood is preferred to a model with a lower one.

3. **LL-Null**: -25540
    - This is the log-likelihood of a model with no predictors, i.e., only an intercept. It serves as a baseline against which the log-likelihood of the estimated model is compared.

4. **LLR p-value**: 2.135e-93
    - This is the p-value for the likelihood ratio test comparing the fit of the model with predictors to the fit of the model with only an intercept. The very small p-value suggests that your model with the predictor `misaligned_previous` fits significantly better than a model with no predictors.

### Coefficients:

- **const (Intercept)**:
    - Coefficient: 0.1247
    - This is the log odds of voting in a subsequent proposal for someone whose previous vote was aligned (since `misaligned_previous` is 0 for them).
    - The positive coefficient suggests that, on average, there's a positive log odds of voting in a subsequent proposal.

- **misaligned_previous**:
    - Coefficient: 0.6632
    - This coefficient represents the change in the log odds of voting in a subsequent proposal for someone whose previous vote was misaligned compared to someone whose vote was aligned.
    - The positive coefficient suggests that individuals who had a misaligned vote in the past have higher log odds of voting in a subsequent proposal compared to those who had aligned votes.

Both coefficients are statistically significant (p < 0.05), suggesting that both the intercept and the effect of having a misaligned vote in the past are different from zero.

---

In summary, the regression results suggest that having a misaligned vote in the past is associated with a higher likelihood of voting in a subsequent proposal.