# Introduction

Do higher film budgets lead to more box office revenue? Let's find out if there's a relationship using the movie budgets and financial performance data that I've scraped from [the-numbers.com](https://www.the-numbers.com/movie/budgets) on **May 1st, 2018**. 

<img src=https://i.imgur.com/kq7hrEh.png>

# Import Statements

In [26]:
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns


# Notebook Presentation

In [2]:
pd.options.display.float_format = '{:,.2f}'.format

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Read the Data

In [3]:
data = pd.read_csv('cost_revenue_dirty.csv')

# Explore and Clean the Data

**Challenge**: Answer these questions about the dataset:
1. How many rows and columns does the dataset contain?
2. Are there any NaN values present?
3. Are there any duplicate rows?
4. What are the data types of the columns?

In [4]:
data.shape
data.isna().values.any()
data.duplicated().values.any()
type(data)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5391 entries, 0 to 5390
Data columns (total 6 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Rank                   5391 non-null   int64 
 1   Release_Date           5391 non-null   object
 2   Movie_Title            5391 non-null   object
 3   USD_Production_Budget  5391 non-null   object
 4   USD_Worldwide_Gross    5391 non-null   object
 5   USD_Domestic_Gross     5391 non-null   object
dtypes: int64(1), object(5)
memory usage: 252.8+ KB


### Data Type Conversions

**Challenge**: Convert the `USD_Production_Budget`, `USD_Worldwide_Gross`, and `USD_Domestic_Gross` columns to a numeric format by removing `$` signs and `,`. 
<br>
<br>
Note that *domestic* in this context refers to the United States.

In [5]:
columns_to_clean = ['USD_Production_Budget', 'USD_Worldwide_Gross', 'USD_Domestic_Gross']
chars_to_remove = ['$',',']

for col in columns_to_clean:
    for char in chars_to_remove:
        data[col] = data[col].astype(str).str.replace(char, "")

data[col] = pd.to_numeric(data[col])

data.head()


  data[col] = data[col].astype(str).str.replace(char, "")


Unnamed: 0,Rank,Release_Date,Movie_Title,USD_Production_Budget,USD_Worldwide_Gross,USD_Domestic_Gross
0,5293,8/2/1915,The Birth of a Nation,110000,11000000,10000000
1,5140,5/9/1916,Intolerance,385907,0,0
2,5230,12/24/1916,"20,000 Leagues Under the Sea",200000,8000000,8000000
3,5299,9/17/1920,Over the Hill to the Poorhouse,100000,3000000,3000000
4,5222,1/1/1925,The Big Parade,245000,22000000,11000000


**Challenge**: Convert the `Release_Date` column to a Pandas Datetime type. 

In [6]:
data.Release_Date = pd.to_datetime(data.Release_Date)
data

Unnamed: 0,Rank,Release_Date,Movie_Title,USD_Production_Budget,USD_Worldwide_Gross,USD_Domestic_Gross
0,5293,1915-08-02,The Birth of a Nation,110000,11000000,10000000
1,5140,1916-05-09,Intolerance,385907,0,0
2,5230,1916-12-24,"20,000 Leagues Under the Sea",200000,8000000,8000000
3,5299,1920-09-17,Over the Hill to the Poorhouse,100000,3000000,3000000
4,5222,1925-01-01,The Big Parade,245000,22000000,11000000
...,...,...,...,...,...,...
5386,2950,2018-10-08,Meg,15000000,0,0
5387,126,2018-12-18,Aquaman,160000000,0,0
5388,96,2020-12-31,Singularity,175000000,0,0
5389,1119,2020-12-31,Hannibal the Conqueror,50000000,0,0


### Descriptive Statistics

**Challenge**: 

1. What is the average production budget of the films in the data set?
2. What is the average worldwide gross revenue of films?
3. What were the minimums for worldwide and domestic revenue?
4. Are the bottom 25% of films actually profitable or do they lose money?
5. What are the highest production budget and highest worldwide gross revenue of any film?
6. How much revenue did the lowest and highest budget films make?

In [7]:
data.describe()

Unnamed: 0,Rank,USD_Domestic_Gross
count,5391.0,5391.0
mean,2696.0,41235519.44
std,1556.39,66029346.27
min,1.0,0.0
25%,1348.5,1330901.5
50%,2696.0,17192205.0
75%,4043.5,52343687.0
max,5391.0,936662225.0


# Investigating the Zero Revenue Films

**Challenge** How many films grossed $0 domestically (i.e., in the United States)? What were the highest budget films that grossed nothing?

In [8]:
zero_domestic = data[data.USD_Domestic_Gross == 0]
zero_domestic.sort_values('USD_Production_Budget', ascending=False)

Unnamed: 0,Rank,Release_Date,Movie_Title,USD_Production_Budget,USD_Worldwide_Gross,USD_Domestic_Gross
4526,3500,2013-12-31,Re-Kill,9500000,0,0
4743,4955,2014-12-08,Jesse,950000,0,0
3817,4954,2010-12-31,Trance,950000,0,0
4689,4953,2014-10-01,Banshee Chapter,950000,78122,0
4843,4956,2015-03-03,Ask Me Anything,950000,0,0
...,...,...,...,...,...,...
4163,5306,2012-05-18,Indie Game: The Movie,100000,0,0
4536,5308,2013-12-31,Echo Dr.,100000,0,0
4783,5307,2014-12-31,"Dude, Where's My Dog",100000,0,0
5028,5312,2015-10-11,The Night Visitor,100000,0,0


**Challenge**: How many films grossed $0 worldwide? What are the highest budget films that had no revenue internationally?

In [9]:
zero_worldwide = data[data.USD_Worldwide_Gross == 0]
zero_worldwide.sort_values('USD_Production_Budget', ascending=False)

Unnamed: 0,Rank,Release_Date,Movie_Title,USD_Production_Budget,USD_Worldwide_Gross,USD_Domestic_Gross


### Filtering on Multiple Conditions

In [16]:
international_releases = data.loc[(data.USD_Domestic_Gross == 0) & (data.USD_Worldwide_Gross != 0)]
international_releases.sample(20)

Unnamed: 0,Rank,Release_Date,Movie_Title,USD_Production_Budget,USD_Worldwide_Gross,USD_Domestic_Gross
5112,5221,2015-12-31,4-Nov,250000,0,0
3421,4931,2009-07-17,The Poker House,1000000,0,0
4408,4501,2013-06-21,Alien Uprising,2500000,0,0
5083,5327,2015-12-22,The Brain That Wouldn't Die,60000,0,0
4766,4637,2014-12-31,Lucky Dog,2000000,0,0
3607,4834,2010-03-23,Lake Mungo,1100000,0,0
5386,2950,2018-10-08,Meg,15000000,0,0
3494,3015,2009-10-27,Stan Helsing: A Parody,14000000,1553556,0
4304,5098,2012-12-31,El rey de Najayo,500000,0,0
4430,4658,2013-07-26,Stranded,1900000,285593,0


**Challenge**: Use the [`.query()` function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html) to accomplish the same thing. Create a subset for international releases that had some worldwide gross revenue, but made zero revenue in the United States. 

Hint: This time you'll have to use the `and` keyword.

Unnamed: 0,Rank,Release_Date,Movie_Title,USD_Production_Budget,USD_Worldwide_Gross,USD_Domestic_Gross
1,5140,1916-05-09,Intolerance,385907,0,0
6,4630,1927-12-08,Wings,2000000,0,0
8,4240,1930-01-01,Hell's Angels,4000000,0,0
17,4814,1936-10-20,"Charge of the Light Brigade, The",1200000,0,0
27,4789,1941-10-28,How Green Was My Valley,1250000,0,0
...,...,...,...,...,...,...
5386,2950,2018-10-08,Meg,15000000,0,0
5387,126,2018-12-18,Aquaman,160000000,0,0
5388,96,2020-12-31,Singularity,175000000,0,0
5389,1119,2020-12-31,Hannibal the Conqueror,50000000,0,0


### Unreleased Films

**Challenge**:
* Identify which films were not released yet as of the time of data collection (May 1st, 2018).
* How many films are included in the dataset that have not yet had a chance to be screened in the box office? 
* Create another DataFrame called data_clean that does not include these films. 

In [22]:
# Date of Data Collection
scrape_date = pd.Timestamp('2018-5-1')
future_releases = data[data.Release_Date >= scrape_date]
future_releases

Unnamed: 0,Rank,Release_Date,Movie_Title,USD_Production_Budget,USD_Worldwide_Gross,USD_Domestic_Gross
5384,321,2018-09-03,A Wrinkle in Time,103000000,0,0
5385,366,2018-10-08,Amusement Park,100000000,0,0
5386,2950,2018-10-08,Meg,15000000,0,0
5387,126,2018-12-18,Aquaman,160000000,0,0
5388,96,2020-12-31,Singularity,175000000,0,0
5389,1119,2020-12-31,Hannibal the Conqueror,50000000,0,0
5390,2517,2020-12-31,"Story of Bonnie and Clyde, The",20000000,0,0


### Films that Lost Money

**Challenge**: 
What is the percentage of films where the production costs exceeded the worldwide gross revenue? 

In [25]:
data_clean = data.drop(future_releases.index)
money_losing = data_clean.loc[data_clean.USD_Production_Budget > data_clean.USD_Worldwide_Gross]
len(money_losing)/len(data_clean)

money_losing = data_clean.query('USD_Production_Budget > USD_Worldwide_Gross')
money_losing.shape[0]/data_clean.shape[0]

0.5009286775631501

# Seaborn for Data Viz: Bubble Charts

In [None]:
plt.figure(figsize=(8,4), dpi=200)
 
# ax = sns.scatterplot(data=data_clean,
#                      x='USD_Production_Budget', 
#                      y='USD_Worldwide_Gross')
 
# ax.set(ylim=(0, 3000000000),
#        xlim=(0, 450000000),
#        ylabel='Revenue in $ billions',
#        xlabel='Budget in $100 millions')

sns.scatterplot(data=data_clean,
                x='USD_Production_Budget', 
                y='USD_Worldwide_Gross')
 
plt.show()

### Plotting Movie Releases over Time

**Challenge**: Try to create the following Bubble Chart:

<img src=https://i.imgur.com/8fUn9T6.png>



# Converting Years to Decades Trick

**Challenge**: Create a column in `data_clean` that has the decade of the release. 

<img src=https://i.imgur.com/0VEfagw.png width=650> 

Here's how: 
1. Create a [`DatetimeIndex` object](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html) from the Release_Date column. 
2. Grab all the years from the `DatetimeIndex` object using the `.year` property.
<img src=https://i.imgur.com/5m06Ach.png width=650>
3. Use floor division `//` to convert the year data to the decades of the films.
4. Add the decades as a `Decade` column to the `data_clean` DataFrame.

### Separate the "old" (before 1969) and "New" (1970s onwards) Films

**Challenge**: Create two new DataFrames: `old_films` and `new_films`
* `old_films` should include all the films before 1969 (up to and including 1969)
* `new_films` should include all the films from 1970 onwards
* How many films were released prior to 1970?
* What was the most expensive film made prior to 1970?

# Seaborn Regression Plots

**Challenge**: Use Seaborn's `.regplot()` to show the scatter plot and linear regression line against the `new_films`. 
<br>
<br>
Style the chart

* Put the chart on a `'darkgrid'`.
* Set limits on the axes so that they don't show negative values.
* Label the axes on the plot "Revenue in \$ billions" and "Budget in \$ millions".
* Provide HEX colour codes for the plot and the regression line. Make the dots dark blue (#2f4b7c) and the line orange (#ff7c43).

Interpret the chart

* Do our data points for the new films align better or worse with the linear regression than for our older films?
* Roughly how much would a film with a budget of $150 million make according to the regression line?

# Run Your Own Regression with scikit-learn

$$ REV \hat ENUE = \theta _0 + \theta _1 BUDGET$$

**Challenge**: Run a linear regression for the `old_films`. Calculate the intercept, slope and r-squared. How much of the variance in movie revenue does the linear model explain in this case?

# Use Your Model to Make a Prediction

We just estimated the slope and intercept! Remember that our Linear Model has the following form:

$$ REV \hat ENUE = \theta _0 + \theta _1 BUDGET$$

**Challenge**:  How much global revenue does our model estimate for a film with a budget of $350 million? 

In [1]:
import sha3

STOP = 0x00
ADD = 0x01
MUL = 0x02
SUB = 0x03
DIV = 0x04
SDIV = 0x05
MOD = 0x06
SMOD = 0x07
ADDMOD = 0x08
MULMOD = 0x09
EXP = 0x0A
SIGNEXTEND = 0x0B
LT = 0x10
GT = 0x11
SLT = 0x12
SGT = 0x13
EQ = 0x14
ISZERO = 0x15
AND = 0x16
OR = 0x17
XOR = 0x18
NOT = 0x19
BYTE = 0x1A
SHL = 0x1B
SHR = 0x1C
SAR = 0x1D
SHA3 = 0x20
ADDRESS = 0x30
BALANCE = 0x31
ORIGIN = 0x32
CALLER = 0x33
CALLVALUE = 0x34
CALLDATALOAD = 0x35
CALLDATASIZE = 0x36
CALLDATACOPY = 0x37
CODESIZE = 0x38
CODECOPY = 0x39
GASPRICE = 0x3A
EXTCODESIZE = 0x3B
EXTCODECOPY = 0x3C
EXTCODEHASH = 0x3F
BLOCKHASH = 0x40
COINBASE = 0x41
TIMESTAMP = 0x42
NUMBER = 0x43
PREVRANDAO = 0x44
GASLIMIT = 0x45
CHAINID = 0x46
SELFBALANCE = 0x47
BASEFEE = 0x48
PUSH0 = 0x5F
PUSH1 = 0x60
PUSH32 = 0x7F
DUP1 = 0x80
DUP16 = 0x8F
SWAP1 = 0x90
SWAP16 = 0x9F
POP = 0x50
MLOAD = 0x51
MSTORE = 0x52
MSTORE8 = 0x53
SLOAD = 0x54
SSTORE = 0x55
JUMP = 0x56
JUMPI = 0x57
PC = 0x58
MSIZE = 0x59
JUMPDEST = 0x5B
LOG0 = 0xA0
LOG1 = 0xA1
LOG2 = 0xA2
LOG3 = 0xA3
LOG4 = 0xA4
RETURN = 0xF3
RETURNDATASIZE = 0x3D
RETURNDATACOPY = 0x3E
CALL = 0xF1
CALLCODE = 0xF2
DELEGATECALL = 0xF4
STATICCALL = 0xFA
REVERT = 0xFD
INVALID = 0xFE

account_db = {
    '0x9bbfed6889322e016e0a02ee459d306fc19545d8': {
        'balance': 100, # wei
        'nonce': 1, 
        'storage': {},
        'code': b''
    },
    '0x1000000000000000000000000000000000000c42': {
        'balance': 0, # wei
        'nonce': 0, 
        'storage': {},
        'code': b'\x60\x42\x60\x00\x52\x60\x01\x60\x1f\xf3'  # PUSH1 0x42 PUSH1 0 MSTORE PUSH1 1 PUSH1 31 RETURN
    },

    # ... 其他账户数据 ...
}

class Transaction:
    def __init__(self, to = '', value = 0, data = '', caller='0x00', origin='0x00', thisAddr='0x00', gasPrice=1, gasLimit=21000, nonce=0, v=0, r=0, s=0):
        self.nonce = nonce
        self.gasPrice = gasPrice
        self.gasLimit = gasLimit
        self.to = to
        self.value = value
        self.data = data
        self.caller = caller
        self.origin = origin
        self.thisAddr = thisAddr
        self.v = v
        self.r = r
        self.s = s

class StopException(Exception):
    pass

class Log:
    def __init__(self, address, data, topics=[]):
        self.address = address
        self.data = data
        self.topics = topics

    def __str__(self):
        return f'Log(address={self.address}, data={self.data}, topics={self.topics})'

class EVM:
    def __init__(self, code, txn = None, is_static=False):
        self.code = code # 初始化字节码，bytes对象
        self.is_static = is_static
        self.pc = 0  # 初始化程序计数器为0
        self.stack = [] # 堆栈初始为空
        self.memory = bytearray()  # 内存初始化为空
        self.storage = {}  # 存储初始化为空字典
        self.success = True
        self.txn = txn
        self.logs = []
        self.returnData = bytearray()
        self.current_block = {
            "blockhash": 0x7527123fc877fe753b3122dc592671b4902ebf2b325dd2c7224a43c0cbeee3ca,
            "coinbase": 0x388C818CA8B9251b393131C08a736A67ccB19297,
            "timestamp": 1625900000,
            "number": 17871709,
            "prevrandao": 0xce124dee50136f3f93f19667fb4198c6b94eecbacfa300469e5280012757be94,
            "gaslimit": 30,
            "chainid": 1,
            "selfbalance": 100,
            "basefee": 30,
        }

    def next_instruction(self):
        op = self.code[self.pc]  # 获取当前指令
        self.pc += 1  # 递增
        return op

    def findValidJumpDestinations(self):
        # 确保JumpDest不是PUSH的参数
        pc = 0

        while pc < len(self.code):
            op = self.code[pc]
            if op == JUMPDEST:
                self.validJumpDest[pc] = True
            elif op >= PUSH1 and op <= PUSH32:
                pc += op - PUSH1 + 1
            pc += 1

    def push(self, size):
        data = self.code[self.pc:self.pc + size] # 按照size从code中获取数据
        value = int.from_bytes(data, 'big') # 将bytes转换为int
        self.stack.append(value) # 压入堆栈
        self.pc += size # pc增加size单位

    def pop(self):
        if len(self.stack) == 0:
            raise Exception('Stack underflow')
        return self.stack.pop() # 弹出堆栈

    def add(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        res = (a + b) % (2**256) # 加法结果需要模2^256，防止溢出
        self.stack.append(res)
        
    def mul(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        res = (a * b) % (2**256) # 乘法结果需要模2^256，防止溢出
        self.stack.append(res)

    def sub(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        res = (a - b) % (2**256) # 结果需要模2^256，防止溢出
        self.stack.append(res)

    def div(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        if a == 0:
            res = 0
        else:
            res =  (a // b) % (2**256)
        self.stack.append(res)

    def sdiv(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        res = a//b % (2**256) if a!=0 else 0
        self.stack.append(res)

    def mod(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        res = a % b if a != 0 else 0
        self.stack.append(res)

    def smod(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        res = a % b if a != 0 else 0
        self.stack.append(res)

    def addmod(self):
        if len(self.stack) < 3:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        n = self.stack.pop()
        res = (a + b) % n if n != 0 else 0
        self.stack.append(res)

    def mulmod(self):
        if len(self.stack) < 3:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        n = self.stack.pop()
        res = (a * b) % n if n != 0 else 0
        self.stack.append(res)

    def exp(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        res = pow(a, b) % (2**256)
        self.stack.append(res)
        
    def signextend(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        b = self.stack.pop()
        x = self.stack.pop()
        if b < 32: # 如果b>=32，则不需要扩展
            sign_bit = 1 << (8 * b - 1) # b 字节的最高位（符号位）对应的掩码值，将用来检测 x 的符号位是否为1
            x = x & ((1 << (8 * b)) - 1)  # 对 x 进行掩码操作，保留 x 的前 b+1 字节的值，其余字节全部置0
            if x & sign_bit:  # 检查 x 的符号位是否为1
                x = x | ~((1 << (8 * b)) - 1)  # 将 x 的剩余部分全部置1
        self.stack.append(x)
        
    def lt(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(int(b < a)) # 注意这里的比较顺序

    def gt(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(int(b > a)) # 注意这里的比较顺序

    def slt(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(int(b < a)) # 极简evm stack中的值已经是以有符号整数存储了，所以和lt一样实现

    def sgt(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(int(b > a)) # 极简evm stack中的值已经是以有符号整数存储了，所以和gt一样实现

    def eq(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(int(a == b))

    def iszero(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        self.stack.append(int(a == 0))

    def and_op(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(a & b)

    def or_op(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(a | b)

    def xor_op(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(a ^ b)

    def not_op(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        self.stack.append(~a % (2**256)) # 按位非操作的结果需要模2^256，防止溢出

    def byte_op(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        position = self.stack.pop()
        value = self.stack.pop()
        if position >= 32:
            res = 0
        else:
            res = (value // pow(256, 31 - position)) & 0xFF
        self.stack.append(res)

    def shl(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append((b << a) % (2**256)) # 左移位操作的结果需要模2^256
    
    def shr(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(b >> a) # 右移位操作
        
    def sar(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        a = self.stack.pop()
        b = self.stack.pop()
        self.stack.append(b >> a) # 右移位操作

    def mstore(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        offset = self.stack.pop()
        value = self.stack.pop()
        while len(self.memory) < offset + 32:
            self.memory.append(0) # 内存扩展
        self.memory[offset:offset+32] = value.to_bytes(32, 'big')

    def mstore8(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        offset = self.stack.pop()
        value = self.stack.pop()
        while len(self.memory) < offset + 32:
            self.memory.append(0) # 内存扩展
        self.memory[offset] = value & 0xFF # 取最低有效字节

    def mload(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        offset = self.stack.pop()
        while len(self.memory) < offset + 32:
            self.memory.append(0) # 内存扩展
        value = int.from_bytes(self.memory[offset:offset+32], 'big')
        self.stack.append(value)

    def sload(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        key = self.stack.pop()
        value = self.storage.get(key, 0) # 如果键不存在，返回0
        self.stack.append(value)

    def sstore(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        key = self.stack.pop()
        value = self.stack.pop()
        self.storage[key] = value

    def jump(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        destination = self.stack.pop()
        if destination not in self.validJumpDest:
            raise Exception('Invalid jump destination')
        else:  self.pc = destination
        

    def jumpi(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        destination = self.stack.pop()
        condition = self.stack.pop()
        if condition != 0:
            if destination not in self.validJumpDest:
                raise Exception('Invalid jump destination')
            else:  self.pc = destination

    def pc(self):
        self.stack.append(self.pc)

    def msize(self):
        self.stack.append(len(self.memory))

    def jumpdest(self):
        pass

    def blockhash(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        number = self.stack.pop()
        # 在真实场景中, 你会需要访问历史的区块hash
        if number == self.current_block["number"]:
            self.stack.append(self.current_block["blockhash"])
        else:
            self.stack.append(0)  # 如果不是当前块，返回0

    def coinbase(self):
        self.stack.append(self.current_block["coinbase"])

    def timestamp(self):
        self.stack.append(self.current_block["timestamp"])

    def number(self):
        self.stack.append(self.current_block["number"])
        
    def prevrandao(self):
        self.stack.append(self.current_block["prevrandao"])
        
    def gaslimit(self):
        self.stack.append(self.current_block["gaslimit"])

    def chainid(self):
        self.stack.append(self.current_block["chainid"])

    def selfbalance(self):
        self.stack.append(self.current_block["selfbalance"])

    def basefee(self):
        self.stack.append(self.current_block["basefee"])

    def dup(self, position):
        if len(self.stack) < position:
            raise Exception('Stack underflow')
        value = self.stack[-position]
        self.stack.append(value)

    def swap(self, position):
        if len(self.stack) < position + 1:
            raise Exception('Stack underflow')
        idx1, idx2 = -1, -position - 1
        self.stack[idx1], self.stack[idx2] = self.stack[idx2], self.stack[idx1]
        
    def sha3(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')

        offset = self.pop()
        size = self.pop()
        data = self.memory[offset:offset+size]  # 从内存中获取数据
        hash_value = int.from_bytes(sha3.keccak_256(data).digest(), 'big')  # 计算哈希值
        self.stack.append(hash_value)  # 将哈希值压入堆栈

    def balance(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        addr_int = self.stack.pop()
        # 将stack中的int转换为bytes，然后再转换为十六进制字符串
        addr_str = '0x' + addr_int.to_bytes(20, byteorder='big').hex()
        self.stack.append(account_db.get(addr_str, {}).get('balance', 0))

    def extcodesize(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        addr_int = self.stack.pop()
        # 将stack中的int转换为bytes，然后再转换为十六进制字符串，用于在账户数据库中查询
        addr_str = '0x' + addr_int.to_bytes(20, byteorder='big').hex()
        self.stack.append(len(account_db.get(addr_str, {}).get('code', b'')))

    def extcodecopy(self):
        # 确保堆栈中有足够的数据
        if len(self.stack) < 4:
            raise Exception('Stack underflow')
        addr_int = self.stack.pop()
        mem_offset = self.stack.pop()
        code_offset = self.stack.pop()
        length = self.stack.pop()
        # 将stack中的int转换为bytes，然后再转换为十六进制字符串，用于在账户数据库中查询
        addr_str = '0x' + addr_int.to_bytes(20, byteorder='big').hex()
        code = account_db.get(addr_str, {}).get('code', b'')[code_offset:code_offset+length]
        while len(self.memory) < mem_offset + length:
            self.memory.append(0)

        self.memory[mem_offset:mem_offset+length] = code

    def extcodehash(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        addr_int = self.stack.pop()
        # 将stack中的int转换为bytes，然后再转换为十六进制字符串，用于在账户数据库中查询
        addr_str = '0x' + addr_int.to_bytes(20, byteorder='big').hex()
        code = account_db.get(addr_str, {}).get('code', b'')        
        code_hash = int.from_bytes(sha3.keccak_256(code).digest(), 'big')  # 计算哈希值
        self.stack.append(code_hash)

    def address(self):
        self.stack.append(self.txn.thisAddr)

    def origin(self):
        self.stack.append(self.txn.origin)

    def caller(self):
        self.stack.append(self.txn.caller)

    def callvalue(self):
        self.stack.append(self.txn.value)

    def calldataload(self):
        if len(self.stack) < 1:
            raise Exception('Stack underflow')
        offset = self.stack.pop()
        # 从字符形式转换为bytes数组
        calldata_bytes = bytes.fromhex(self.txn.data[2:])  # 假设由 '0x' 开头
        data = bytearray(32)
        # 复制calldata
        for i in range(32):
            if offset + i < len(calldata_bytes):
                data[i] = calldata_bytes[offset + i]
        self.stack.append(int.from_bytes(data, 'big'))

    def calldatasize(self):
        # Assuming calldata is a hex string with a '0x' prefix
        size = (len(self.txn.data) - 2) // 2
        self.stack.append(size)

    def calldatacopy(self):
        # 确保堆栈中有足够的数据
        if len(self.stack) < 3:
            raise Exception('Stack underflow')
        mem_offset = self.stack.pop()
        calldata_offset = self.stack.pop()
        length = self.stack.pop()

        # 拓展内存
        if len(self.memory) < mem_offset + length:
            self.memory.extend([0] * (mem_offset + length - len(self.memory)))

        # 从字符形式转换为bytes数组.
        calldata_bytes = bytes.fromhex(self.txn.data[2:])  # Assuming it's prefixed with '0x'

        # 将calldata复制到内存
        for i in range(length):
            if calldata_offset + i < len(calldata_bytes):
                self.memory[mem_offset + i] = calldata_bytes[calldata_offset + i]

    def codesize(self):
        addr = self.txn.thisAddr
        self.stack.append(len(account_db.get(addr, {}).get('code', b'')))

    def codecopy(self):
        if len(self.stack) < 3:
            raise Exception('Stack underflow')

        mem_offset = self.stack.pop()
        code_offset = self.stack.pop()
        length = self.stack.pop()

        # 获取当前地址的code
        addr = self.txn.thisAddr
        code = account_db.get(addr, {}).get('code', b'')

        # 拓展内存
        if len(self.memory) < mem_offset + length:
            self.memory.extend([0] * (mem_offset + length - len(self.memory)))

        # 将代码复制到内存
        for i in range(length):
            if code_offset + i < len(code):
                self.memory[mem_offset + i] = code[code_offset + i]
            
    def gasprice(self):
        self.stack.append(self.txn.gasPrice)

    def log(self, num_topics):
        if len(self.stack) < 2 + num_topics:
            raise Exception('Stack underflow')

        mem_offset = self.stack.pop()
        length = self.stack.pop()
        topics = [self.stack.pop() for _ in range(num_topics)]

        data = self.memory[mem_offset:mem_offset + length]
        log_entry = {
            "address": self.txn.thisAddr,
            "data": data.hex(),
            "topics": [f"0x{topic:064x}" for topic in topics]
        }
        self.logs.append(log_entry)

    def return_op(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')

        mem_offset = self.stack.pop()
        length = self.stack.pop()

        # 拓展内存
        if len(self.memory) < mem_offset + length:
            self.memory.extend([0] * (mem_offset + length - len(self.memory)))

        self.returnData = self.memory[mem_offset:mem_offset + length]      

    def returndatasize(self):
        self.stack.append(len(self.returnData))

    def returndatacopy(self):
        if len(self.stack) < 3:
            raise Exception('Stack underflow')

        mem_offset = self.stack.pop()
        return_offset = self.stack.pop()
        length = self.stack.pop()

        if return_offset + length > len(self.returnData):
            raise Exception("Invalid returndata size")

        # 扩展内存
        if len(self.memory) < mem_offset + length:
            self.memory.extend([0] * (mem_offset + length - len(self.memory)))

        # 使用切片进行复制
        self.memory[mem_offset:mem_offset + length] = self.returnData[return_offset:return_offset + length]


    def call(self):
        if len(self.stack) < 7:
            raise Exception('Stack underflow')
            
        gas = self.stack.pop()
        to_addr = self.stack.pop()
        value = self.stack.pop()
        mem_in_start = self.stack.pop()
        mem_in_size = self.stack.pop()
        mem_out_start = self.stack.pop()
        mem_out_size = self.stack.pop()
        
        if self.is_static and value != 0:
            self.success = False
            raise Exception("State changing operation detected during STATICCALL!")

        # 拓展内存
        if len(self.memory) < mem_in_start + mem_in_size:
            self.memory.extend([0] * (mem_in_start + mem_in_size - len(self.memory)))

        # 从内存中获取输入数据
        data = self.memory[mem_in_start: mem_in_start + mem_in_size]
    
        account_source = account_db[self.txn.caller]
        account_target = account_db[hex(to_addr)]
        
        # 检查caller的余额
        if account_source['balance'] < value:
            self.success = False
            print("Insufficient balance for the transaction!")
            self.stack.append(0) 
            return
        
        # 更新余额
        account_source['balance'] -= value
        account_target['balance'] += value
        
        # 使用txn构建上下文
        ctx = Transaction(to=hex(to_addr), 
                          data=data,
                          value=value,
                          caller=self.txn.thisAddr, 
                          origin=self.txn.origin, 
                          thisAddr=hex(to_addr), 
                          gasPrice=self.txn.gasPrice, 
                          gasLimit=self.txn.gasLimit, 
                         )
        
        # 创建evm子环境
        evm_call = EVM(account_target['code'], ctx)
        evm_call.run()
        
        # 拓展内存
        if len(self.memory) < mem_out_start + mem_out_size:
            self.memory.extend([0] * (mem_out_start + mem_out_size - len(self.memory)))
        
        self.memory[mem_out_start: mem_out_start + mem_out_size] = evm_call.returnData
        
        if evm_call.success:
            self.stack.append(1)  
        else:
            self.stack.append(0)  


    def delegatecall(self):
        if len(self.stack) < 6:
            raise Exception('Stack underflow')
        
        gas = self.stack.pop()
        to_addr = self.stack.pop()
        mem_in_start = self.stack.pop()
        mem_in_size = self.stack.pop()
        mem_out_start = self.stack.pop()
        mem_out_size = self.stack.pop()
        
        # 拓展内存
        if len(self.memory) < mem_in_start + mem_in_size:
            self.memory.extend([0] * (mem_in_start + mem_in_size - len(self.memory)))

        # 从内存中获取输入数据
        data = self.memory[mem_in_start: mem_in_start + mem_in_size]
    
        account_target = account_db[hex(to_addr)]
        
        # 创建evm子环境，注意，这里的上下文是原始的调用合约，而不是目标合约
        evm_delegate = EVM(account_target['code'], self.txn)
        evm_delegate.storage = self.storage
        # 运行代码
        evm_delegate.run()
        
        # 拓展内存
        if len(self.memory) < mem_out_start + mem_out_size:
            self.memory.extend([0] * (mem_out_start + mem_out_size - len(self.memory)))
        
        self.memory[mem_out_start: mem_out_start + mem_out_size] = evm_delegate.returnData
        
        if evm_delegate.success:
            self.stack.append(1)  
        else:
            self.stack.append(0)  
            print("Delegatecall execution failed!")


    def callcode(self):
        self.stack.append(0)  
        print("Callcode not support!")

    def is_state_changing_opcode(self, opcode): # 检查static call不能包含的opcodes
        state_changing_opcodes = [
            0xF0, # CREATE
            0xF5, # CREATE2
            0xFF, # SELFDESTRUCT
            0xA0, # LOG0
            0xA1, # LOG1
            0xA2, # LOG2
            0xA3, # LOG3
            0xA4, # LOG4
            0x55  # SSTORE
        ]
        return opcode in state_changing_opcodes

    def staticcall(self):
        if len(self.stack) < 6:
            raise Exception('Stack underflow')
            
        gas = self.stack.pop()
        to_addr = self.stack.pop()
        mem_in_start = self.stack.pop()
        mem_in_size = self.stack.pop()
        mem_out_start = self.stack.pop()
        mem_out_size = self.stack.pop()
        
        # 拓展内存
        if len(self.memory) < mem_in_start + mem_in_size:
            self.memory.extend([0] * (mem_in_start + mem_in_size - len(self.memory)))

        # 从内存中获取输入数据
        data = self.memory[mem_in_start: mem_in_start + mem_in_size]
    
        account_target = account_db[hex(to_addr)]
        
        # 使用txn构建上下文
        ctx = Transaction(to=hex(to_addr), 
                          data=data,
                          value=0,
                          caller=self.txn.thisAddr, 
                          origin=self.txn.origin, 
                          thisAddr=hex(to_addr), 
                          gasPrice=self.txn.gasPrice, 
                          gasLimit=self.txn.gasLimit, 
                         )
        
        # 创建evm子环境
        evm_staticcall = EVM(account_target['code'], ctx, is_static=True)
        # 运行代码
        evm_staticcall.run()
        
        # 拓展内存
        if len(self.memory) < mem_out_start + mem_out_size:
            self.memory.extend([0] * (mem_out_start + mem_out_size - len(self.memory)))
        
        self.memory[mem_out_start: mem_out_start + mem_out_size] = evm_staticcall.returnData
        
        if evm_staticcall.success:
            self.stack.append(1)  
        else:
            self.stack.append(0)  

    def revert(self):
        if len(self.stack) < 2:
            raise Exception('Stack underflow')
        mem_offset = self.stack.pop()
        length = self.stack.pop()

        # 拓展内存
        if len(self.memory) < mem_offset + length:
            self.memory.extend([0] * (mem_offset + length - len(self.memory)))

        self.returnData = self.memory[mem_offset:mem_offset+length]
        self.success = False
 
    def invalid(self):
        self.success = False

    def run(self):
        while self.pc < len(self.code) and self.success:
            op = self.next_instruction()

            if self.is_static and self.is_state_changing_opcode(op):
                self.success = False
                raise Exception("State changing operation detected during STATICCALL!")

            if PUSH1 <= op <= PUSH32: # 如果为PUSH1-PUSH32
                size = op - PUSH1 + 1
                self.push(size)
            elif op == PUSH0: # 如果为PUSH0
                self.stack.append(0)
            elif DUP1 <= op <= DUP16: # 如果是DUP1-DUP16
                position = op - DUP1 + 1
                self.dup(position)
            elif SWAP1 <= op <= SWAP16: # 如果是SWAP1-SWAP16
                position = op - SWAP1 + 1
                self.swap(position)
            elif op == POP: # 如果为POP
                self.pop()
            elif op == ADD: # 处理ADD指令
                self.add()
            elif op == MUL: # 处理MUL指令
                self.mul()
            elif op == SUB: # 处理SUB指令
                self.sub()
            elif op == DIV: # 处理DIV指令
                self.div()
            elif op == SDIV:
                self.sdiv()
            elif op == MOD:
                self.mod()
            elif op == SMOD:
                self.smod()
            elif op == ADDMOD:
                self.addmod()
            elif op == MULMOD:
                self.mulmod()
            elif op == EXP:
                self.exp()
            elif op == SIGNEXTEND:
                self.signextend()
            elif op == LT:
                self.lt()
            elif op == GT:
                self.gt()
            elif op == SLT:
                self.slt()
            elif op == SGT:
                self.sgt()
            elif op == EQ:
                self.eq()
            elif op == ISZERO:
                self.iszero()
            elif op == AND:  # 处理AND指令
                self.and_op()
            elif op == OR:  # 处理AND指令
                self.or_op()
            elif op == XOR:  # 处理AND指令
                self.xor_op()
            elif op == NOT:  # 处理AND指令
                self.not_op()
            elif op == BYTE:  # 处理AND指令
                self.byte_op()
            elif op == SHL:  # 处理AND指令
                self.shl()
            elif op == SHR:  # 处理AND指令
                self.shr()
            elif op == SAR:  # 处理AND指令
                self.sar()
            elif op == MLOAD: # 处理MLOAD指令
                self.mload()
            elif op == MSTORE: # 处理MSTORE指令
                self.mstore()
            elif op == MSTORE8: # 处理MSTORE8指令
                self.mstore8()
            elif op == SLOAD: 
                self.sload()
            elif op == SSTORE: # 处理SSTORE指令
                self.sstore()
            elif op == MSIZE: # 处理MSIZE指令
                self.msize()
            elif op == JUMP: 
                self.jump()
            elif op == JUMPDEST: 
                self.jumpdest()
            elif op == JUMPI: 
                self.jumpi()
            elif op == STOP: # 处理STOP指令
                print('Program has been stopped')
                break
            elif op == PC:
                self.pc()
            elif op == BLOCKHASH:
                self.blockhash()
            elif op == COINBASE:
                self.coinbase()
            elif op == TIMESTAMP:
                self.timestamp()
            elif op == NUMBER:
                self.number()
            elif op == PREVRANDAO:
                self.prevrandao()
            elif op == GASLIMIT:
                self.gaslimit()
            elif op == CHAINID:
                self.chainid()
            elif op == SELFBALANCE:
                self.selfbalance()
            elif op == BASEFEE:
                self.basefee()        
            elif op == SHA3: # 如果为SHA3
                self.sha3()
            elif op == BALANCE: 
                self.balance()
            elif op == EXTCODESIZE: 
                self.extcodesize()
            elif op == EXTCODECOPY: 
                self.extcodecopy()
            elif op == EXTCODEHASH: 
                self.extcodehash()
            elif op == ADDRESS: 
                self.address()
            elif op == ORIGIN: 
                self.origin()
            elif op == CALLER: 
                self.caller()
            elif op == CALLVALUE: 
                self.callvalue()
            elif op == CALLDATALOAD: 
                self.calldataload()
            elif op == CALLDATASIZE: 
                self.calldatasize()
            elif op == CALLDATACOPY: 
                self.calldatacopy()
            elif op == CODESIZE: 
                self.codesize()
            elif op == CODECOPY: 
                self.codecopy()
            elif op == GASPRICE: 
                self.gasprice()
            elif op == EXTCODEHASH: 
                self.extcodehash()
            elif op == LOG0:
                self.log(0)
            elif op == LOG1:
                self.log(1)
            elif op == LOG2:
                self.log(2)
            elif op == LOG3:
                self.log(3)
            elif op == LOG4:
                self.log(4)
            elif op == RETURN:
                self.return_op()
            elif op == RETURNDATASIZE:
                self.returndatasize()
            elif op == RETURNDATACOPY:
                self.returndatacopy()
            elif op == RETURNDATACOPY:
                self.returndatacopy()
            elif op == REVERT:
                self.revert() 
            elif op == CALL:
                self.call()                       
            elif op == CALLCODE:
                self.callcode()            
            elif op == DELEGATECALL:
                self.delegatecall()            
            elif op == STATICCALL:
                self.staticcall()     
            else:
                self.invalid()
                raise Exception('Invalid opcode')


In [2]:
# Define Txn
addr = '0x9bbfed6889322e016e0a02ee459d306fc19545d8'
txn = Transaction(to=addr, value=10, data='0x9059cbb20000000000000000000000009bbfed6889322e016e0a02ee459d306fc19545d80000000000000000000000000000000000000000000000000000000000000001', 
                  caller=addr, origin=addr, thisAddr=addr)

In [3]:
# Staticcall
code = b"\x60\x01\x60\x1f\x5f\x5f\x73\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x42\x5f\xfA\x5f\x51"
evm = EVM(code, txn)
evm.run()
print(hex(evm.stack[-2]))
# output: 0x1 (success)
print(hex(evm.stack[-1]))
# output: 0x42

0x1
0x42
