<a href="https://colab.research.google.com/github/lcbjrrr/quant/blob/master/J_DataAnaSum.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Author:** Luiz Barboza

**Date:** 20/dec/22

**Title:** Data Analysis (Summarization)

**Lang:** Julia

# Julia Instalation

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.8.3" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools"
JULIA_PACKAGES_IF_GPU="CUDA" # or CuArrays for older Julia versions
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  nvidia-smi -L &> /dev/null && export GPU=1 || export GPU=0
  if [ $GPU -eq 1 ]; then
    JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.8.3 on the current Colab Runtime...
2022-12-21 00:21:16 URL:https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.3-linux-x86_64.tar.gz [130030846/130030846] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package BenchmarkTools...
Installing Julia package CUDA...


In [1]:
versioninfo()

Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, haswell)
  Threads: 2 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/lib64-nvidia
  LD_PRELOAD = /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
  JULIA_NUM_THREADS = 2


# Data Analysis and Summarization

## Libs

In [None]:
import Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("Statistics")

using CSV
using DataFrames
using Statistics


## Data Analysis

In [9]:
;wget https://raw.githubusercontent.com/lcbjrrr/data/main/grades%20-%20okk.csv

--2022-12-21 00:27:58--  https://raw.githubusercontent.com/lcbjrrr/data/main/grades%20-%20okk.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 338 [text/plain]
Saving to: ‘grades - okk.csv.2’

     0K                                                       100% 21.5M=0s

2022-12-21 00:27:58 (21.5 MB/s) - ‘grades - okk.csv.2’ saved [338/338]



In [11]:
#read csv
grades=CSV.read("grades - okk.csv", DataFrame)
print(grades)

[1m12×7 DataFrame[0m
[1m Row [0m│[1m Course  [0m[1m Student [0m[1m AP1     [0m[1m AP2     [0m[1m AP3     [0m[1m Grade   [0m[1m Score   [0m
     │[90m String3 [0m[90m String7 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m String1 [0m
─────┼───────────────────────────────────────────────────────────────
   1 │ ADM      João         9.0      8.0      9.0      8.6  B
   2 │ ADM      Maria        6.0      4.0     10.0      6.0  D
   3 │ LAW      José         4.0      3.0      4.0      3.6  F
   4 │ LAW      Pedro        8.0     10.0      7.0      8.6  B
   5 │ ECO      Paulo        7.5      8.0      9.5      8.1  B
   6 │ LAW      Esther       6.0      4.5      6.0      5.4  D
   7 │ ADM      Gabriel      8.0      6.0      8.0      7.2  B
   8 │ LAW      Rafael       7.5     10.0      9.5      8.9  B
   9 │ ECO      Davi         6.0     10.0      7.0      7.8  B
  10 │ LAW      Silvio      10.0      9.5      9.5      9.7  A
  11 │ ADM

In [30]:
#select column
grades.Score 

12-element Vector{String1}:
 "B"
 "D"
 "F"
 "B"
 "B"
 "D"
 "B"
 "B"
 "B"
 "A"
 "B"
 "F"

In [40]:
#select columns
grades[:,[:Grade,:Score]]

Row,Grade,Score
Unnamed: 0_level_1,Float64,String1
1,8.6,B
2,6.0,D
3,3.6,F
4,8.6,B
5,8.1,B
6,5.4,D
7,7.2,B
8,8.9,B
9,7.8,B
10,9.7,A


In [27]:
#filter rows
filter(:Score => ==("A"), grades)

Row,Course,Student,AP1,AP2,AP3,Grade,Score
Unnamed: 0_level_1,String3,String7,Float64,Float64,Float64,Float64,String1
1,LAW,Silvio,10.0,9.5,9.5,9.7,A


In [31]:
#head
first(grades,2)

Row,Course,Student,AP1,AP2,AP3,Grade,Score
Unnamed: 0_level_1,String3,String7,Float64,Float64,Float64,Float64,String1
1,ADM,João,9.0,8.0,9.0,8.6,B
2,ADM,Maria,6.0,4.0,10.0,6.0,D


In [32]:
#tail
last(grades,1)

Row,Course,Student,AP1,AP2,AP3,Grade,Score
Unnamed: 0_level_1,String3,String7,Float64,Float64,Float64,Float64,String1
1,ADM,Raquel,4.5,4.0,4.0,4.2,F


## Summarization (Consolidation) 

In [39]:
# Summarize: group by Course (categorical), and calculate mean on Grade (numerical)
combine(groupby(grades, :Course), :Grade=> mean)

Row,Course,Grade_mean
Unnamed: 0_level_1,String3,Float64
1,ADM,6.64
2,LAW,7.24
3,ECO,7.95
