
### Weighted Voting Analysis

#### 1. Library Imports
First, we import the necessary libraries.
```python
import pandas as pd
import json
from sqlalchemy import create_engine
import statsmodels.api as sm
```

#### 2. Database Connection
We create a connection to a MySQL database using the provided credentials and database name.
```python
engine = create_engine("mysql+pymysql://root:password@localhost/snapshot_database")
```

#### 3. Data Extraction
We extract the `votes` and `proposals` tables from the database.
```python
votes = pd.read_sql('SELECT * FROM votes', con=engine)
proposals = pd.read_sql('SELECT * FROM proposals', con=engine)
```

#### 4. Data Merging
We merge the `votes` and `proposals` dataframes based on the proposal IDs. The suffixes help differentiate between columns that are present in both tables.
```python
votes_proposals = votes.merge(proposals, left_on='proposal', right_on='id', suffixes=('_vote', '_proposal'))
```

#### 5. Filtering by Voting Type
We focus only on the rows where the voting type is "weighted".
```python
votes_weighted = votes_proposals[votes_proposals['type'] == 'weighted'].copy()
```

#### 6. Determine Winning Choice
We define a function `get_weighted_winning_choice` that will determine the winning choice for each proposal based on voters' weights. The function aggregates weights for each choice and returns the choice with the highest aggregate weight.
```python
def get_weighted_winning_choice(choices_list):
    ...
```

Then, we apply this function to our dataframe, grouping by proposal ID.
```python
winning_choices_weighted = votes_weighted.groupby('id_proposal').apply(lambda group: get_weighted_winning_choice(group['choice'].tolist()))
votes_weighted['winning_choice'] = votes_weighted['id_proposal'].map(winning_choices_weighted)
```

#### 7. Alignment Analysis
We determine if each vote was aligned with the winning choice.
```python
votes_weighted['aligned'] = (votes_weighted['choice'] == votes_weighted['winning_choice']).astype(int)
```

#### 8. Lag Variable for Previous Alignment
For each voter within a specific DAO, we shift the alignment column to create a lag variable, which captures the alignment of the voter's previous vote.
```python
votes_weighted['previous_aligned'] = votes_weighted.groupby(['voter', 'space_vote'])['aligned'].shift()
```

#### 9. Misalignment Analysis
We create a binary column indicating if the previous vote was misaligned.
```python
votes_weighted['misaligned_previous'] = (votes_weighted['previous_aligned'] == 0).astype(int)
```

#### 10. Future Voting Behavior Analysis
We determine if a voter participated in a subsequent proposal within the same DAO.
```python
votes_weighted['future_voting'] = votes_weighted.groupby(['voter', 'space_vote'])['choice'].shift(-1).notna().astype(int)
```

#### 11. Regression Analysis
Finally, we perform a logistic regression to examine the relationship between a voter's past misalignment and their likelihood to participate in future voting.
```python
X = votes_weighted[['misaligned_previous']]
X = sm.add_constant(X)  # Adds a constant term to the predictor
y = votes_weighted['future_voting']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())
```

In [1]:
import pandas as pd
import json
from sqlalchemy import create_engine
import statsmodels.api as sm

# Create an engine to the database
engine = create_engine("mysql+pymysql://root:password@localhost/snapshot_database")

# Read in the votes and proposals tables
votes = pd.read_sql('SELECT * FROM votes', con=engine)
proposals = pd.read_sql('SELECT * FROM proposals', con=engine)

# Merge votes and proposals
votes_proposals = votes.merge(proposals, left_on='proposal', right_on='id', suffixes=('_vote', '_proposal'))

# Filter to only weighted voting
votes_weighted = votes_proposals[votes_proposals['type'] == 'weighted'].copy()

def get_weighted_winning_choice(choices_list):
    total_weights = {}
    for choice_str in choices_list:
        try:
            choice_dict = json.loads(choice_str.replace("'", "\""))
            for choice, weight in choice_dict.items():
                total_weights[choice] = total_weights.get(choice, 0) + weight
        except:
            choice = choice_str
            total_weights[choice] = total_weights.get(choice, 0) + 1
    return max(total_weights, key=total_weights.get)

winning_choices_weighted = votes_weighted.groupby('id_proposal').apply(lambda group: get_weighted_winning_choice(group['choice'].tolist()))
votes_weighted['winning_choice'] = votes_weighted['id_proposal'].map(winning_choices_weighted)

# Determine if a vote was aligned with the winning choice
votes_weighted['aligned'] = (votes_weighted['choice'] == votes_weighted['winning_choice']).astype(int)

# Create a lag variable for previous alignment
votes_weighted['previous_aligned'] = votes_weighted.groupby(['voter', 'space_vote'])['aligned'].shift()

# Indicate if the previous vote was misaligned
votes_weighted['misaligned_previous'] = (votes_weighted['previous_aligned'] == 0).astype(int)

# Indicate if the voter voted in a subsequent proposal within the same DAO
votes_weighted['future_voting'] = votes_weighted.groupby(['voter', 'space_vote'])['choice'].shift(-1).notna().astype(int)

# Regression analysis
X = votes_weighted[['misaligned_previous']]
X = sm.add_constant(X)  # Adds a constant term to the predictor
y = votes_weighted['future_voting']

model = sm.Logit(y, X)
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.593208
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:          future_voting   No. Observations:               136356
Model:                          Logit   Df Residuals:                   136354
Method:                           MLE   Df Model:                            1
Date:                Wed, 06 Sep 2023   Pseudo R-squ.:                  0.1136
Time:                        06:58:09   Log-Likelihood:                -80888.
converged:                       True   LL-Null:                       -91252.
Covariance Type:            nonrobust   LLR p-value:                     0.000
                          coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------
const                  -0.5216      0.009    -58.230      0.000      -0.539      -0.504
mi