## Overview

CPython is the reference implementation of the Python programming language. Written in C and Python, CPython 
is the default and most widely used implementation of the Python language.  CPython is both a compiler and an 
interpreter.

There are several other alternative implementations to CPython.  Take a look at the website below. 

In [1]:
import webbrowser
webbrowser.open("https://hackr.io/blog/python-interpreters")

True

## CPython is both a Compiler and Interpreter

CPython is both a compiler and interpreter.  The compiler translates out source code into bytecode and the 
interpreter executes the code.  Each operating system has a different interpreter that understands native
code for that system.  Most of the work in running a Python program is done by the interpreter.  The compiler
stage is relatively simple.  

Here we have a function that adds z and y.  Note that the bytecode produced by the compiler is the same for
two integers as it is for two strings.  All the work is done by the BINARY_ADD opcode.  However, what we can't
easily see is how BINARY_ADD works in the interpreter.  It turns out that BINARY handles the two cases very 
differently.  But to see that, we would need to look at the C code inside CPython.  Unfortunately the code is
too complicated to show here.  

Older code from Python2 was a little easier to understand.  We reproduce it here to illustrate how the two cases
are handled very differently in the interpreter.  Don't worry about the details, just observe that PyInt is used
in CPython for integers and PyString for strings.  Line 20 checks for integers and then executes code in lines 21
to 31.  Line 33 checks for strings and then executes code via the function string_concatenate that is called in   
line 36: 
<pre>
   case BINARY_ADD:
            w = POP();
            v = TOP();
            if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
                /* INLINE: int + int */
                register long a, b, i;
                a = PyInt_AS_LONG(v);
                b = PyInt_AS_LONG(w);
                /* cast to avoid undefined behaviour
                   on overflow */
                i = (long)((unsigned long)a + b);
                if ((i^a) < 0 && (i^b) < 0)
                    goto slow_add;
                x = PyInt_FromLong(i);
            }
            else if (PyString_CheckExact(v) &&
                     PyString_CheckExact(w)) {
                x = string_concatenate(v, w, f, next_instr);
                /* string_concatenate consumed the ref to v */
                goto skip_decref_vx;
            }
            else {
              slow_add:
                x = PyNumber_Add(v, w);
            }
            Py_DECREF(v);
          skip_decref_vx:
            Py_DECREF(w);
            SET_TOP(x);
            if (x != NULL) continue;
            break;

import dis

# the compiler generates bytecode for this function
# and the interpreter works out what to do with each opcode
def add(x, y):
    return x + y

print( add("ABC", "DEF") )
print( add(5, 7) )

# look at the disassembled bytecode
dis.dis(add)


# the disassembler generates the following
'''
  6           0 LOAD_FAST                0 (x)
              2 LOAD_FAST                1 (y)
              4 BINARY_ADD          
              6 RETURN_VALUE 
'''
# note: BINARY_ADD is interpreted differently for str and int



</pre>


In [None]:
'''
CPython is both a Compiler and Interpreter
==========================================

CPython is both a compiler and interpreter.  The compiler translates out source code into bytecode and the 
interpreter executes the code.  Each operating system has a different interpreter that understands native
code for that system.  Most of the work in running a Python program is done by the interpreter.  The compiler
stage is relatively simple.

Here we have a function that adds z and y.  Note that the bytecode produced by the compiler is the same for
two integers as it is for two strings.  All the work is done by the BINARY_ADD opcode.  However, what we can't
easily see is how BINARY_ADD works in the interpreter.  It turns out that BINARY handles the two cases very 
differently.  But to see that, we would need to look at the C code inside CPython.  Unfortunately the code is
too complicated to show here.

Older code from Python2 was a little easier to understand.  We reproduce it here to illustrate how the two cases
are handled very differently in the interpreter.  Don't worry about the details, just observe that PyInt is used
in CPython for integers and PyString for strings.  Line 20 checks for integers and then executes code in lines 21
to 31.  Line 33 checks for strings and then executes code via the function string_concatenate that is called in 
line 36: 

   case BINARY_ADD:
            w = POP();
            v = TOP();
            if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
                /* INLINE: int + int */
                register long a, b, i;
                a = PyInt_AS_LONG(v);
                b = PyInt_AS_LONG(w);
                /* cast to avoid undefined behaviour
                   on overflow */
                i = (long)((unsigned long)a + b);
                if ((i^a) < 0 && (i^b) < 0)
                    goto slow_add;
                x = PyInt_FromLong(i);
            }
            else if (PyString_CheckExact(v) &&
                     PyString_CheckExact(w)) {
                x = string_concatenate(v, w, f, next_instr);
                /* string_concatenate consumed the ref to v */
                goto skip_decref_vx;
            }
            else {
              slow_add:
                x = PyNumber_Add(v, w);
            }
            Py_DECREF(v);
          skip_decref_vx:
            Py_DECREF(w);
            SET_TOP(x);
            if (x != NULL) continue;
            break;
'''

import dis

# the compiler generates bytecode for this function
# and the interpreter works out what to do with each opcode
def add(x, y):
    return x + y

print( add("ABC", "DEF") )
print( add(5, 7) )

# look at the disassembled bytecode
dis.dis(add)

# the disassembler generates the following
'''
  6           0 LOAD_FAST                0 (x)
              2 LOAD_FAST                1 (y)
              4 BINARY_ADD          
              6 RETURN_VALUE 
'''
# note: BINARY_ADD is interpreted differently for str and int





In [None]:
'''
Inspecting Bytecode
===================

We can look at the byte code for the function square using the comprehension shown below.  The disassembler (dis)
shows each bytecode in human readable form.  At the end of the program we print the bytecodes in hex.
'''

from math import sqrt
import dis

def square(x,y):
    sq = sqrt(x**2 + y**2)
    return sq



# disassemble function square
dis.dis(square)

# now look at hex
print("\nprint the bytecode in hex:")
bytecode = [ hex(x) for x in square.__code__.co_code]
print(", ".join(bytecode))



In [None]:
'''
AS we know, CPython is written in C.  Opcodes are stored in a C header file called opcode.h.  Run this script to 
see the first few opcodes (the first few lines in the file are C code and irrelevant, so by using tail we will 
omit these lines).  This header file is for Python 3.10.2.
'''

import os
numberOfOpcodes = 20
skipLines = 9
os.system(f"head -{numberOfOpcodes + skipLines} opcode.h | tail -{numberOfOpcodes}")
