### Exemplo de uso do Multiple Dispatch, IO e DataFrames

In [1]:
;ls

bank-full.csv
banktest.jl
componentesprincipais.ipynb
fileIO.ipynb
iris.csv
myexp.jl
normtype.jl
Preliminares.ipynb
README.md
tests.jl


In [2]:
#= Se o arquivo que você quer adicionar está no diretório atual basta
chamar include("nome_do_arquivo.jl"), caso contrário é preciso passar
todo o path na string =#
include("myexp.jl")

myexp (generic function with 2 methods)

In [3]:
?myexp

search: [0m[1mm[22m[0m[1my[22m[0m[1me[22m[0m[1mx[22m[0m[1mp[22m



```
myexp(x, n::Integer)
```

Retorna `x*x*...*x`, n vezes. Retorna o mesmo que `x*myexp(x, n-1)`, se n > 0.


Como a função `myexp` foi definida sem restrição em x, `myexp(x,n)` abstrai o tipo da variável x até que não seja mais possível fazer isso, que é quando algum produto (*) precisa ter seu valor calculado.
Dessa forma, se x é um tipo para o qual o operador * estiver definido, a função `myexp` funciona.

In [4]:
myexp(2,3)

8

In [5]:
myexp(2.0,3)

8.0

In [6]:
m = [2 0;
     0 2]
myexp(m,3)

2×2 Array{Int64,2}:
 8  0
 0  8

In [7]:
# assim descobrimos que * é um operador de concatenação de strings
myexp("ha", 3)

"hahaha"

In [8]:
myexp(2, true)

2

O comportamento acima acontece porque `myexp` exige que o expoente seja do tipo `Integer` e, por acaso, `Bool` é subtipo de `Integer`

In [9]:
# Pode-se utilizar ; no final de um comando para suprimir
# a impressão de seu valor de retorno
include("banktest.jl");

In [10]:
# readdlm faz parte do módulo DelimitedFiles
# para poder utilizá-la, deve ser dado o comando
using DelimitedFiles
arq = readdlm("bank-full.csv", ';', header=true)

(Any[58 "management" … "unknown" "no"; 44 "technician" … "unknown" "no"; … ; 57 "blue-collar" … "unknown" "no"; 37 "entrepreneur" … "other" "no"], AbstractString["age" "job" … "poutcome" "y"])

Wow, o que é tudo isso?

In [11]:
# Tuplas são containers genéricos.
# Funções que tem mais de um valor de retorno retornam uma tupla com os valores
typeof(arq)

Tuple{Array{Any,2},Array{AbstractString,2}}

In [12]:
# Parece que a primeira entrada da 2-upla contém os dados
# e a outra os nomes das variáveis
println(size(arq[1]))
println(size(arq[2]))

(45211, 17)
(1, 17)


In [13]:
data = arq[1]

45211×17 Array{Any,2}:
 58  "management"    "married"   …   261  1   -1   0  "unknown"  "no" 
 44  "technician"    "single"        151  1   -1   0  "unknown"  "no" 
 33  "entrepreneur"  "married"        76  1   -1   0  "unknown"  "no" 
 47  "blue-collar"   "married"        92  1   -1   0  "unknown"  "no" 
 33  "unknown"       "single"        198  1   -1   0  "unknown"  "no" 
 35  "management"    "married"   …   139  1   -1   0  "unknown"  "no" 
 28  "management"    "single"        217  1   -1   0  "unknown"  "no" 
 42  "entrepreneur"  "divorced"      380  1   -1   0  "unknown"  "no" 
 58  "retired"       "married"        50  1   -1   0  "unknown"  "no" 
 43  "technician"    "single"         55  1   -1   0  "unknown"  "no" 
 41  "admin."        "divorced"  …   222  1   -1   0  "unknown"  "no" 
 29  "admin."        "single"        137  1   -1   0  "unknown"  "no" 
 53  "technician"    "married"       517  1   -1   0  "unknown"  "no" 
  ⋮                              ⋱                    

In [14]:
data_names = arq[2]

1×17 Array{AbstractString,2}:
 "age"  "job"  "marital"  "education"  …  "previous"  "poutcome"  "y"

In [15]:
# A matriz de dados contém diferentes tipos de dados em suas colunas
typeof.(data[1,:])

17-element Array{DataType,1}:
 Int64            
 SubString{String}
 SubString{String}
 SubString{String}
 SubString{String}
 Int64            
 SubString{String}
 SubString{String}
 SubString{String}
 Int64            
 SubString{String}
 Int64            
 Int64            
 Int64            
 Int64            
 SubString{String}
 SubString{String}

A vantagem do Multiple Dispatch, do ponto de vista do usuário da linguagem é poder criar diferentes versões da mesma função para lidar com tipos diferentes. Num contexto estrito, isso leva a um ganho de performance.

No caso desse exemplo que apresento aqui, essa característica permitiu a criação de uma função `preprocess` com dois métodos, um lidando com inteiros e outro lidando com strings.

In [16]:
?preprocess

search: [0m[1mp[22m[0m[1mr[22m[0m[1me[22m[0m[1mp[22m[0m[1mr[22m[0m[1mo[22m[0m[1mc[22m[0m[1me[22m[0m[1ms[22m[0m[1ms[22m



```
preprocess(data, col[, levels]) -> Matrix{Float64}
```

Preprocesses col-th column of data. If the list of levels is provided preprocess is more efficient. When the levels of the selected column are nominal, preprocess returns a matrix of dummy indicator variables, one for each of the categories contained in the given column of matrix data, with except for the last level. This exception is a consequence of the last level being uniquely defined from the others. Including the last level would make the output matrix a singular one. When the levels of the selected column are numerical, preprocess returns a vector of those values converted to Float64.


In [17]:
# A função buildmultivar produz uma matriz com as colunas dadas na lista
# O Multiple Dispatch resolve a questão de enviar os tipos certos para as 
# versões certas
D = buildmultivar(data, [1, 3])

45211×3 Array{Float64,2}:
 58.0  1.0  0.0
 44.0  0.0  1.0
 33.0  1.0  0.0
 47.0  1.0  0.0
 33.0  0.0  1.0
 35.0  1.0  0.0
 28.0  0.0  1.0
 42.0  0.0  0.0
 58.0  1.0  0.0
 43.0  0.0  1.0
 41.0  0.0  0.0
 29.0  0.0  1.0
 53.0  1.0  0.0
  ⋮            
 34.0  0.0  1.0
 38.0  1.0  0.0
 53.0  1.0  0.0
 34.0  0.0  1.0
 23.0  0.0  1.0
 73.0  1.0  0.0
 25.0  0.0  1.0
 51.0  1.0  0.0
 71.0  0.0  0.0
 72.0  1.0  0.0
 57.0  1.0  0.0
 37.0  1.0  0.0

Julia também oferece a estrutura de dados de Data Frames.

In [18]:
using DataFrames

# Acredito que o pacote CSV.jl elimina a necessidade de uar a função Symbol.
# Admito que parece exotérico nesse momento essa necessidade
df = DataFrame(data, Symbol.(vec(data_names)))

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
1,58,management,married,tertiary,no,2143,yes,no,unknown,5,may,261,1,-1,0,unknown,no
2,44,technician,single,secondary,no,29,yes,no,unknown,5,may,151,1,-1,0,unknown,no
3,33,entrepreneur,married,secondary,no,2,yes,yes,unknown,5,may,76,1,-1,0,unknown,no
4,47,blue-collar,married,unknown,no,1506,yes,no,unknown,5,may,92,1,-1,0,unknown,no
5,33,unknown,single,unknown,no,1,no,no,unknown,5,may,198,1,-1,0,unknown,no
6,35,management,married,tertiary,no,231,yes,no,unknown,5,may,139,1,-1,0,unknown,no
7,28,management,single,tertiary,no,447,yes,yes,unknown,5,may,217,1,-1,0,unknown,no
8,42,entrepreneur,divorced,tertiary,yes,2,yes,no,unknown,5,may,380,1,-1,0,unknown,no
9,58,retired,married,primary,no,121,yes,no,unknown,5,may,50,1,-1,0,unknown,no
10,43,technician,single,secondary,no,593,yes,no,unknown,5,may,55,1,-1,0,unknown,no


In [19]:
describe(df)

Unnamed: 0,variable,mean,min,median,max,nunique,nmissing,eltype
1,age,40.9362,18,,95,77,0,Any
2,job,,admin.,,unknown,12,0,Any
3,marital,,divorced,,single,3,0,Any
4,education,,primary,,unknown,4,0,Any
5,default,,no,,yes,2,0,Any
6,balance,1362.27,-8019,,102127,7168,0,Any
7,housing,,no,,yes,2,0,Any
8,loan,,no,,yes,2,0,Any
9,contact,,cellular,,unknown,3,0,Any
10,day,15.8064,1,,31,31,0,Any


In [20]:
tail(df)

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
1,25,technician,single,secondary,no,505,no,yes,cellular,17,nov,386,2,-1,0,unknown,yes
2,51,technician,married,tertiary,no,825,no,no,cellular,17,nov,977,3,-1,0,unknown,yes
3,71,retired,divorced,primary,no,1729,no,no,cellular,17,nov,456,2,-1,0,unknown,yes
4,72,retired,married,secondary,no,5715,no,no,cellular,17,nov,1127,5,184,3,success,yes
5,57,blue-collar,married,secondary,no,668,no,no,telephone,17,nov,508,4,-1,0,unknown,no
6,37,entrepreneur,married,secondary,no,2971,no,no,cellular,17,nov,361,2,188,11,other,no


In [21]:
# Tudo que é preciso saber para abrir arquivos e escrever neles
?open

search: [0m[1mo[22m[0m[1mp[22m[0m[1me[22m[0m[1mn[22m is[0m[1mo[22m[0m[1mp[22m[0m[1me[22m[0m[1mn[22m pr[0m[1mo[22m[0m[1mp[22m[0m[1me[22mrty[0m[1mn[22mames C[0m[1mo[22mm[0m[1mp[22mosit[0m[1me[22mExceptio[0m[1mn[22m [0m[1mo[22m[0m[1mp[22m[0m[1me[22mrm getpr[0m[1mo[22m[0m[1mp[22m[0m[1me[22mrty



```
open(filename::AbstractString; keywords...) -> IOStream
```

Open a file in a mode specified by five boolean keyword arguments:

| Keyword    | Description            | Default                               |
|:---------- |:---------------------- |:------------------------------------- |
| `read`     | open for reading       | `!write`                              |
| `write`    | open for writing       | `truncate \| append`                  |
| `create`   | create if non-existent | `!read & write \| truncate \| append` |
| `truncate` | truncate to zero size  | `!read & write`                       |
| `append`   | seek to end            | `false`                               |

The default when no keywords are passed is to open files for reading only. Returns a stream for accessing the opened file.

---

```
open(filename::AbstractString, [mode::AbstractString]) -> IOStream
```

Alternate syntax for open, where a string-based mode specifier is used instead of the five booleans. The values of `mode` correspond to those from `fopen(3)` or Perl `open`, and are equivalent to setting the following boolean groups:

| Mode | Description                   | Keywords                       |
|:---- |:----------------------------- |:------------------------------ |
| `r`  | read                          | none                           |
| `w`  | write, create, truncate       | `write = true`                 |
| `a`  | write, create, append         | `append = true`                |
| `r+` | read, write                   | `read = true, write = true`    |
| `w+` | read, write, create, truncate | `truncate = true, read = true` |
| `a+` | read, write, create, append   | `append = true, read = true`   |

# Examples

```jldoctest
julia> io = open("myfile.txt", "w");

julia> write(io, "Hello world!");

julia> close(io);

julia> io = open("myfile.txt", "r");

julia> read(io, String)
"Hello world!"

julia> write(io, "This file is read only")
ERROR: ArgumentError: write failed, IOStream is not writeable
[...]

julia> close(io)

julia> io = open("myfile.txt", "a");

julia> write(io, "This stream is not read only")
28

julia> close(io)

julia> rm("myfile.txt")
```

---

```
open(f::Function, args...; kwargs....)
```

Apply the function `f` to the result of `open(args...; kwargs...)` and close the resulting file descriptor upon completion.

# Examples

```jldoctest
julia> open("myfile.txt", "w") do io
           write(io, "Hello world!")
       end;

julia> open(f->read(f, String), "myfile.txt")
"Hello world!"

julia> rm("myfile.txt")
```

---

```
open(command, stdio=devnull; write::Bool = false, read::Bool = !write)
```

Start running `command` asynchronously, and return a tuple `(stream,process)`.  If `read` is true, then `stream` reads from the process's standard output and `stdio` optionally specifies the process's standard input stream.  If `write` is true, then `stream` writes to the process's standard input and `stdio` optionally specifies the process's standard output stream.

---

```
open(f::Function, command, mode::AbstractString="r", stdio=devnull)
```

Similar to `open(command, mode, stdio)`, but calls `f(stream)` on the resulting process stream, then closes the input stream and waits for the process to complete. Returns the value returned by `f`.
