# Advanced Frequency Analysis of Smart Contract Risks

Welcome to this interactive tutorial on conducting advanced frequency analysis using Python in Google Colab!

## Objective
The objective of this notebook is to teach you how to handle larger datasets and create dynamic visualizations to analyze the frequency of risk tags associated with smart contracts. This will involve loading data, performing data manipulation, and visualizing the results using Python libraries such as pandas, matplotlib, and seaborn.

## Before You Start
This notebook assumes you have some familiarity with basic programming concepts and a basic understanding of Python. If you are completely new to Python, I recommend reviewing Python basics before proceeding.

Let's get started by setting up our environment and loading the data!


### Step 1: Import libraries

In [1]:
# Import necessary libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Display plots inline
%matplotlib inline



### Step 2: Download the dataset

This step would download the Webacy Smart Contract Risk dataset. If you have your own dataset, then please add it to colab's environment.

In [2]:
!gdown 1andAuermOWqVXfhsh_AQ3Db93D3BIqgx

Downloading...
From: https://drive.google.com/uc?id=1andAuermOWqVXfhsh_AQ3Db93D3BIqgx
To: /content/compiled_risk_data.xlsx
  0% 0.00/291k [00:00<?, ?B/s]100% 291k/291k [00:00<00:00, 68.9MB/s]


In [3]:
print("Setup complete. Imported pandas, seaborn, and matplotlib. Downloaded Webacy risk dataset.")

Setup complete. Imported pandas, seaborn, and matplotlib. Downloaded Webacy risk dataset.


### Step 3: Load the Data Section

Now even though we have downloaded the dataset, we still need to load it into our Python environment. For this we will utilize the Pandas library.

In [4]:
# Loading the dataset

df = pd.read_excel('/path/to/data')

# Display the first five rows of the dataframe
df.head()

FileNotFoundError: [Errno 2] No such file or directory: '/path/to/data'

In [None]:
# We can then visualize other aspects of the data.
# For example, check for data types and null values

df.info()

### Frequency Analysis

In [None]:
# Let's now look at the value counts of an individual risk tag: is_airdrop_scam

df['is_airdrop_scam'].value_counts()

Okay so we see that over 50% of the dataset has True for the column `is_airdrop_scam`. Note that this is a dummy dataset and in real world you won't have that many scams, atleast we can hope that we don't that many scams.

Now, let's define all the risk columns in our dataset so that we can then run the analysis on the same.

In [None]:
risk_columns = ['Is_closed_source', 'hidden_owner', 'anti_whale_modifiable',
       'Is_anti_whale', 'Is_honeypot', 'buy_tax', 'sell_tax',
       'slippage_modifiable', 'Is_blacklisted', 'can_take_back_ownership',
       'owner_change_balance', 'is_airdrop_scam', 'selfdestruct', 'trust_list',
       'is_whitelisted', 'is_fake_token', 'illegal_unicode', 'exploitation',
       'bad_contract', 'reusing_state_variable', 'encode_packed_collision',
       'encode_packed_parameters', 'centralized_risk_medium',
       'centralized_risk_high', 'centralized_risk_low', 'event_setter',
       'external_dependencies', 'immutable_states',
       'reentrancy_without_eth_transfer', 'incorrect_inheritance_order',
       'shadowing_local', 'events_maths']

Now that we know all the risk columns let's do a full frequency analysis on these columns.

In [None]:
# Calculating the frequency of 'True' in each risk tag column
frequencies = df[risk_columns].apply(lambda x: x.value_counts()).loc[True]
frequencies = frequencies.fillna(0)  # Replace NaN with 0 for any column that may not have True values
frequencies

Now that we have the frequencies, we can also visualize these using a barchart

In [None]:
# Visualizing the frequencies using a bar chart
sns.set_style("whitegrid")
plt.figure(figsize=(12, 8))
sns.barplot(x=frequencies.index, y=frequencies.values, palette='viridis')
plt.title('Frequency of True Values for Each Risk Tag')
plt.xlabel('Risk Tags')
plt.ylabel('Frequency of True')
plt.xticks(rotation=90)
plt.show()


**Again note that this is a dummy dataset. Your frequencies might be significantly lower than this with the real dataset.**

# Conclusion
Great job! You have successfully completed a frequency analysis of risk tags in smart contracts using Python. You've learned how to load data, perform calculations, and visualize the results using some of the most powerful libraries in Python.

## Next Steps
- Try modifying the charts or calculations to explore other aspects of the data.
- Consider analyzing the frequency of 'False' values or other specific conditions.
- Use this notebook as a template for analyzing other datasets.

Remember, the skills you've learned here are applicable to a wide range of data analysis tasks. Keep practicing and exploring!

Thank you for following along, and happy coding!


## Tips for Further Learning
- Explore the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/) for more advanced data manipulation techniques.
- Dive deeper into [matplotlib](https://matplotlib.org/stable/contents.html) and [seaborn](https://seaborn.pydata.org/) to discover more visualization styles and options.
- Participate in online forums and communities to enhance your learning and connect with other learners.
