# ISJ - introduction

## Classification of programming languages
* There are **various criteria**
* Based on:
  * http://codenugget.co/2015/03/05/declarative-vs-imperative-programming-web.html
  * https://en.wikibooks.org/wiki/Introduction_to_Programming_Languages/Programming_Language_Paradigms
  * https://www.quora.com/What-is-the-difference-between-a-compiler-and-an-interpreter


## Imperative vs. declarative
* **Imperative** programming is about telling your machine **how** to do something
* **Declarative** programming is about telling your machine **what** you would like to happen in order to do something

## Imperative vs. declarative
* **Imperative** languages specify **explicit manipulation** of the computerâs internal state; **procedural** languages specify an **explicit sequence of steps** to follow
* **Declarative** languages are **functional** or **relational** 
  * relationships between variables in terms of **functions** or **inference rules**
  * the language executor (interpreter or compiler) applies some fixed algorithm to these relations to produce a result

$$s = \sum_{x=1}^Nx^2 = 1^2 + 2^2 + 3^3 + ... + N^2$$

## Imperative code in Javascript

In [41]:
function sumOfSquares(nums) {
  var i, sum = 0, squares = [];
  for (i = 0; i < nums.length; i++) {
    squares.push(nums[i]*nums[i]);
  }

  for (i = 0; i < squares.length; i++) {
    sum += squares[i];
  }

  return sum;
}

console.log(sumOfSquares([1, 2, 3, 4, 5]));



SyntaxError: invalid syntax (<ipython-input-41-4fc3362e82f4>, line 1)

## Declarative/functional code in Clojure

In [42]:
(defn square-of [n]
  (* n n))

(defn sum-of-squares [nums]
  (reduce + (map #(square-of %) nums)))


(sum-of-squares '(1 2 3 4 5))



SyntaxError: invalid syntax (<ipython-input-42-22ff0c02cfe0>, line 1)

## Functional code in Javascript

In [None]:
function sumOfSquares2(nums) {
  return nums
    .map(function(num) { return num * num; })
    .reduce(function(start, num) { return start + num; }, 0)
  ;
}

console.log(sumOfSquares([1, 2, 3, 4, 5]));



## Declarative code - SQL

In [None]:
SELECT * 
FROM dogs
INNER JOIN owners
WHERE dogs.owner_id = owners.id

## Imperative code for the same

In [None]:
//dogs = [{name: 'Fido', owner_id: 1}, {...}, ... ]
//owners = [{id: 1, name: 'Bob'}, {...}, ...]

var dogsWithOwners = []
var dog, owner

for(var di=0; di < dogs.length; di++) {
  dog = dogs[di]

  for(var oi=0; oi < owners.length; oi++) {
    owner = owners[oi]
    if (owner && dog.owner_id == owner.id) {
      dogsWithOwners.push({
        dog: dog,
        owner: owner
      })
    }
  }}
}

## Declarative code - Prolog

In [None]:
    man(adam). 
    man(peter). 
    man(paul). 
      
    woman(marry). 
    woman(eve). 

    parent(adam,peter). % means adam is parent of peter 
    parent(eve,peter). 
    parent(adam,paul). 
    parent(marry,paul). 

In [None]:
    son(S,P):-man(S),parent(P,S). 
    daughter(D,P):-woman(D),parent(P,D). 
        
    ?-son(X,adam). 
       
    siblings(A,B):-parent(P,A),parent(P,B),A\=B.
      
    uncle(U,N):-man(U),siblings(U,P),parent(P,N). 
     
    grand_parent(G,N):-parent(G,X),parent(X,N). 

    descendent(D,A):-parent(A,D). 
    descendent(D,A):-parent(P,D),descendent(P,A). 

## Compiled vs. interpreted
 - Programming languages are **not** classified as **interpreted** or **compiled**
- Their particular **implementations** are
- People confuse **common implementation** methods with **necessary properties** of language implementations

## Examples
- **C** is typically **compiled**, but C interpreters exist
- **BASIC** was traditionally an **interpreted** language, but there are many BASIC compilers in use today

- Some languages lend themselves better to interpretation than to compilation, and vice versa
- But interpretation vs. compilation is **not considered a property of the language itself**

## Compiler
(in a narrow sense)
turns a **higher-level** program into a **native-binary** program

A **native-binary program** is:
- a bunch of instructions (cleverly called the **text** segment)
- a bunch of space for global data (named the **data** segment)
- a bunch of empty workspace for intermediate calculations (called the **stack**)
- a bunch of empty space to place stuff we don't know the size of before it's needed (called the **heap**)

![native-binary_program.png](native-binary_program.png)

- When a native-binary program is launched, the **operating system** loads it into a memory and it becomes a **process**
- The components are the **principal parts** of a process

- The program runs by the CPU keeping a **counter** that points somewhere into the **text**, the instruction there tells it what to do to the **data** areas or instruction **counter**
- Afterwards it adjusts the instruction counter (usually just **increment** it), and **repeats** with the next instruction it points at

## Interpreter
- **Programs** can act **as processors** too
- An **interpreter** is a software-made processor
- It reads a bunch of general program instructions and make them happen one by one while they are being read

## Virtual machine / emulator
Depending on how abstract and what sort of instructions an interpreter reads, people may also call it things like **virtual machine** or **emulator**

## Compiler (in a broad sense)
- To further confuse the terminology, with things like Java there's a **compiler** turning Java programs into simpler instructions **bytecode**
- It stores them in a file read by an interpreter (**virtual machine**)
- Adding the whole circus once more, just starting from **one level up** the abstraction ladder

## JIT
* A **JIT** (Just-In-Time) compiler runs **after the program has started**
* It compiles the code (usually bytecode or some kind of VM instructions) on the fly into a form that's usually faster, typically the host CPU's **native instruction set**
* A JIT has access to **dynamic runtime information** whereas a standard compiler doesn't
* Thus, it can make better **optimizations** like inlining functions that are used frequently

## Static vs. dynamic typing
based on (and further reading):
* http://stackoverflow.com/questions/1517582/what-is-the-difference-between-statically-typed-and-dynamically-typed-languages
* https://pchiusano.github.io/2016-09-15/static-vs-dynamic.html
* http://softwareengineering.stackexchange.com/questions/122205/what-is-the-supposed-productivity-gain-of-dynamic-typing

## Static vs. dynamic typing
* https://thesocietea.org/2015/11/programming-concepts-static-vs-dynamic-type-checking/
* https://blog.jooq.org/2014/12/11/the-inconvenient-truth-about-dynamic-vs-static-typing/

## Static vs. dynamic typing
* A **type system** is a collection of rules that assign a property called **type** to various constructs in a computer program
* Types are assigned to variables, expressions, functions, modules...
* Defines a set of **rules** and **protocols** behind how a piece of data is supposed to **behave**
* Reduces the number of **bugs** by **verifying** that data is represented properly throughout a program


## Types of types
* **Primitive** types â common types such as **integers, booleans, floats**, and **characters**
* **Composite** types â composed of more than one primitive type, e.g. an **array** or **record** - all composite types are considered **data structures**
* **Abstract** types â types that do not have a specific implementation (and thus can be represented via multiple types), such as a **hash**, **set**, **queue**, and **stack**
* **Other** types â such as **pointers** (a type which holds as its value a reference to a different memory location) and **functions**


## Type checking
* The process of **verifying** and **enforcing** the constraints of types, i.e. that the types make logical sense in the program so that the program can be executed successfully
* It can occur either at **compile time** or at **runtime**

# Type-safe
* Type checking is all about ensuring that the program is **type-safe**, meaning that the possibility of **type errors** is kept to a minimum
* A **type error** is an erroneous program behavior in which an **operation** occurs (or tries to occur) on a particular **data type** that itâs **not meant** to occur on
* This could be a situation where an operation is performed on an integer with the intent that it is a float, or even something such as adding a string and an integer together

## Static vs. dynamic typing

* **Static** typing means that a reference value is manifestly (which is not necessarily the same as at compile time) constrained with respect to the type of the value it can denote
* The language implementation, whether it is a compiler or an interpreter, both enforces and uses these constraints as much as possible
* A language is **dynamically** typed if the type is associated with run-time **values**, and not named variables

In [None]:
// C++ (static typing)
double obj = 3.14;
obj += PI;  // obj = 2*PI;
obj = "Hello";  // ERROR: cannot convert 'const char[6]' to 'double'

In [None]:
# Python (dynamic typing)
obj = 3.14
obj += PI  # obj = 2*PI
obj = "Hello"
obj = ["I", "am", "a", "list", "of", "strings"]

In [None]:
// C++11 (type inference)
std::vector<int> numbers;
// std::vector<int>::iterator iter = numbers.iterator();
auto iter = numbers.iterator();

## The golden mean
* Dart, C++11 - `auto`, C\# - `var`
* type hints in Python 3.5+ - can be statically checked (e.g., by mypy)

In [None]:
from typing import List
Vector = List[float]

def scale(scalar: float, vector: Vector) -> Vector:
    return [scalar * num for num in vector]

# typechecks; a list of floats qualifies as a Vector.
new_vector = scale(2.0, [1.0, -4.2, 5.4])

## Explicitly vs. implicitly typed
* Many widely-used â**industry**â languages such as Java, C, or C++ are **explicitly** typed
* In other words, they require **lots of type declarations**
* In the world of less explicitly typed languages, where these declarations are optional, the declarations are often called **type annotations**

## Explicitly typed vs. type inference
* The first **statically typed** languages were **explicitly typed** by necessity
* **Type inference** algorithms employ techniques for looking at source code with **no type declarations** at all, and deciding what the types of its variables are
* Type inference is typical for languages such as **OCaml, Haskell, Scala**


## Advantages of static typing
* Better **design** - being forced to think about the types of values in your software up front **can push you** towards **cleaner**, more **logical** solutions. (**can** - it's still possible to design really bad code...)
* Better **compile time checking** - static typing can enable more errors to be caught at compile time - arguably the best thing about statically typed languages overall
* **Auto-completion** - static typing can also give more information to the IDE so that auto-completion of code or documentation lookup is more effective

## Advantages of static typing
* **Discourages hacks** - you have to keep type **discipline** in your code, which is likely to be an advantage for long term **maintainability**
* **Type inference** can get you many of the conciseness benefits of dynamic languages while still maintaining type discipline

## Advantages of dynamic typing
* More **concise** - A lot of extraneous **boilerplate code can be removed** if everything is dynamically typed - type declarations, typecasting logic etc. All other things being equal, **shorter code** is marginally quicker to write, but more importantly it can be **quicker to read and maintain** (since you don't need to wade through many pages of code to get a grip on what is happening)
* Easier to **"hack"** techniques such as **duck typing** and **monkey patching** can get you results very quickly (although might confuse you later on...)

## Advantages of dynamic typing
* More **interactive** - dynamic typing is arguably more suitable for **interactive**, **REPL** (Read-Eval-Print Loop)-like programming for **rapid prototyping**, **real-time debugging** of running program instances or even **live coding**
* **Test cases** can catch the **runtime errors** - assuming you are using **TDD** (Test-Driven Development) or at the very least have a good **test suite**, this should pick up any typing issues in your code



## Advantages of dynamic typing
* Better **polymorphism** - dynamic languages are potentially more likely to **encourage** the creation of **polymorphic** functions and abstractions, which can boost productivity and code re-use - for example, Clojure makes great use of dynamic polymorphism in its many abstractions
* **Prototypes** - **prototype-based** data / object models are in my view more powerful and flexible than statically typed **inheritance heirarchies** - dynamic languages are more likely to allow or encourage a prototype-based approach, Javascript being a great example



# Strongly vs. weakly typed
* A **strongly-typed** language - variables are bound to specific data types, and will result in type errors if types do not match up as expected in the expression â regardless of when type checking occurs
* High degrees of type safety
* A strongly-typed language would result in an explicit type error which ends the programâs execution, thus forcing the developer to fix the bug in:

In [1]:
a = 1 + '2'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

# Weakly typed
* A **weakly-typed** language - variables can be implicitly coerced to unrelated types
* PHP, Perl, Rexx, C

In [None]:
// PHP
$foo = "x";
$foo = $foo + 2; // not an error
echo $foo;       // 2

In [None]:
a  = 9
b = "9"
c = concatenate(a, b)  // produces "99"
d = add(a, b)          // produces 18

In [None]:
# Python (strong typing)
number = "13"
result = number / 2  # TypeError: unsupported operand types for /

In [None]:
# Perl, PHP (weak typing)
$number = "13";
$result = $number / 2;  # OK: implicit conversion "13" -> 13

## Early vs. late binding
* https://en.wikibooks.org/wiki/Introduction_to_Programming_Languages/Binding
* https://thesocietea.org/2015/11/programming-concepts-static-vs-dynamic-type-checking/
* http://stackoverflow.com/questions/10580/what-is-the-difference-between-early-and-late-binding
* http://softwareengineering.stackexchange.com/questions/200115/what-is-early-and-late-binding

## Early vs. late binding
* For **types**
  * **Early** binding - type is known before the variable is exercised during run-time, usually through a static, declarative means
  * **Late** binding - type is unknown until the variable is exercised during run-time; dynamically typed languages call this an underlying feature, but many statically typed languages have some method of achieving late binding
  * Implemented often using (special) dynamic types, introspection/reflection, flags and compiler options

## Early vs. late binding
* For **functions**
  * **Static dispatch** - known, specific function or subroutine at compile time; it is unambiguous and matched by the signature
  * Implemented as **static functions**; no method can have the same signature

## Early vs. late binding
* For **functions**
  * **Dynamic dispatch** - not a specific function or subroutine at compile time; determined by the context during execution
  * Implemented as **virtual** or abstract functions; other clues include overridden, hidden, or shadowed methods
  * Whether or not method overloading involves dynamic dispatch is **language-specific**, for example, in Java, overloaded methods are statically dispatched

## Lazy vs. eager evaluation (loading)
* binding for **values**
  * **Lazy** loading is an object initialization strategy that **defers value assignment until needed**
  * Allows an object to be in an essentially valid but knowingly **incomplete state** and **waiting** until the data is needed before loading it
  * Often found particularly useful for loading **large datasets** or waiting on **external resources**

## Lazy loading
* Implemented often by purposefully **not loading** a collection or list into a composite object **during** the constructor or **initialization** calls
* Until some downstream **caller asks** to see the contents of that collection
* Variations include loading **meta information** about the collection (like size or keys), but **omitting the actual data**

In [2]:
def f():
 a = b = 1
 while True:
  yield a
  a,b = b,a+b

In [3]:
for i in f():
  if i > 100:
    break
  else:
    print(i)

1
1
2
3
5
8
13
21
34
55
89


In [4]:
import itertools
even_numbers = itertools.count(2, step=2)

In [5]:
for number in even_numbers:
  if number > 10:
    break
  else:
    print(number)

2
4
6
8
10


In [None]:
(def fibs 
  (map first 
       (iterate 
           (fn [[ a, b       ]]  
                [ b, (+ a b) ]) 
           [0, 1])))     

In [None]:
(def fib-seq (lazy-cat [0 1]
 (map + fib-seq (rest fib-seq))))

## Manual vs. automatic memory management
* Dynamically **allocate** portions of **memory** to programs at their request, and **free it for reuse** when no longer needed
* **Garbage collection** is the automatic recycling of dynamically allocated memory
* It is performed by a garbage collector which recycles memory that it **can prove** will never be used again

## Common features of scripting languages
* strong data types (list, hash)
* excellent support for text processing (incl. regular expressions)
* easy to run external programs and to process their outputs (system programming)
* filesystem manipulation
* tranferability across platforms
* availability of modules extending the basic functionality

## An incomplete list of scripting languages
* **PHP** - a server-side scripting language designed primarily for **web development**, PHP code may be embedded into HTML code, or it can be used in combination with various web template systems, web content management systems and web frameworks
* **JavaScript** - one of the three core technologies of **World Wide Web** content production, **prototype-based** language with first-class functions, making it a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles; **client**- as well as **server** side

## An incomplete list of scripting languages
* **Lua** - a **lightweight** language designed primarily for **embedded** systems and clients, written in ANSI C, with a relatively simple C API
* **Perl** â "the Swiss Army chainsaw of scripting languages" (**flexibility and power**, "ugliness"), "duct tape that holds the Internet together" (ubiquitous use as a glue language, perceived **inelegance**)

## An incomplete list of scripting languages
* **command line shells** (bash, zsh...) - command processors calling typically **external programs**, especially from GNU Coreutils
* **PowerShell** â a task automation and configuration management framework from **Microsoft**, consisting of a command-line shell and associated scripting language built on the **.NET Framework**

## An incomplete list of scripting languages
* **Ruby** -  a genuine **object-oriented*, increasing awareness thanks to **Ruby-on-Rails**
* **Python** - ideal as the first programming language, extension modules, standard interface language for various data crunching systems


## Python teaser
* default values of function parameters
* keyword arguments
* variable number of arguments

In [None]:
def format_number(number, precision=3, comma=".", thousends=" "):
    # implement the functionality
    # ...
    return number
print(format_number(1))

    number = format_number(11.423629, comma=",", precision=2)
    plot(data, xlabel="x", xlabel="f(x)" xrange=[0, 10], title="Demo")

In [None]:
#Python
def calculate_sum(*numbers):
    summary = 0
    for number in numbers:
        summary += number
    return summary

calculate_sum()  # 0
calculate_sum(1, 87, 3.14, 1 + 2j, 3)  # 95.14 + 2j

## Short code
* reading real numbers from a file

In [None]:
# %load numbers.txt
1.1 9 5.2
1.762543e-02
0 0.01 0.001
9 3 7


In [6]:
filename = "numbers.txt"
with open(filename) as numbfile:
  numbers = numbfile.read().split()

for number in numbers:
    print(number)

1.1
9
5.2
1.762543e-02
0
0.01
0.001
9
3
7


In [None]:
# Perl
open F, $filename;
$content = join "", <F>;
@numbers = split ' ', $content;

In [None]:
# Python
import re
haystack = "(-3, 1.4)"
needle = r"\(\s*([^,]+)\s*,\s*([^,]+)\s*\)"
match = re.search(needle, haystack)
re, im = (float(x) for x in match.groups())

In [None]:
# Perl
$haystack = "(-3, 1.4)";
$needle = /\(\s*([^,]+)\s*,\s*([^,]+)\s*\)/;
($re, $im) = $haystack =~ $needle;

In [7]:
# Python: download file from URL
from urllib.request import urlopen

url = "http://httpbin.org/headers"
with urlopen(url) as response:
    content = response.read()

In [8]:
# Python: convert CSV to JSON
import csv
import json

with open("organizations.csv") as orgfile:
  lines = list(csv.reader(orgfile))
  print(json.dumps(lines))

[["196680;653838;PRACE-4IP;participant;999868144;\"VYSOKA SKOLA BANSKA - TECHNICKA UNIVERZITA OSTRAVA\";VSB;HES;false;272021;CZ;\"17 LISTOPADU 15/2172\";\"OSTRAVA PORUBA\";\"70 833\";www.vsb.cz;;;;;;;;"], ["194816;652816;GasOn;participant;999848744;\"CESKE VYSOKE UCENI TECHNICKE V PRAZE\";CTU;HES;false;149775;CZ;\"ZIKOVA 4\";PRAHA;16636;www.cvut.cz;;;;;;;;"], ["194883;653514;OSEM-EV;participant;999873091;\"VYSOKE UCENI TECHNICKE V BRNE\";BUT;HES;false;400000;CZ;\"ANTONINSKA 548/1\";\"BRNO STRED\";\"601 90\";www.vutbr.cz;;;;;;;;"]]


## When to use scripting languages
* the task is to combine existing components
* manipulation with strings/text
* application code will be changed frequently
* CPU-intensive parts are (can be) implemented in a compiled language
* one can take advantage of lists and hashes
* the application will communicate with web servers

## When not to use scripting languages
* loops over very large data structures
* complex algorithms with complex data structures
* enormous data and run time critical
* functionality well defined and should not change for a long time
* strict type control required (large development teams)

## Performance benchmarks
* various comparisons on the web
* diverse methodology (diverse order)
* results generally depend on HW, algorithms, particular constructs employed
* language implementation can be alway accelerated (Google V8, PyPy)


## Performance benchmarks
* https://blog.famzah.net/2016/02/09/cpp-vs-python-vs-perl-vs-php-performance-benchmark-2016/
* http://benchmarksgame.alioth.debian.org/u64q/python.html
* http://karlheinzniebuhr.github.io/en/2015/09/28/C-vs-Go-vs-pypy-vs-Python/


## Installation
* see https://www.fit.vutbr.cz/study/courses/ISJ/private/
* https://www.continuum.io/downloads
* jupyter notebook
* https://www.ruby-lang.org/en/documentation/installation/
* https://github.com/SciRuby/iruby

## Next lectures
* 2nd - GNU Core utils
* 3rd - Regular expressions


## GNU CoreUtils
* https://www.gnu.org/software/coreutils/coreutils.html
* http://www.grymoire.com/Unix/Sed.html
* http://stackabuse.com/zsh-vs-bash/

## Regular expressions
* https://pycon2016.regex.training/
* http://www.informit.com/articles/article.aspx?p=1310965
* https://code.tutsplus.com/tutorials/advanced-regular-expression-tips-and-techniques--net-11011
* http://www.rexegg.com/regex-lookarounds.html
* http://www.regular-expressions.info/lookaround.html