# *Transform Data*

---

This notebook explore a code to transform the data into transformed data. Transformed data will be used to develop tools and functions but final results will be generated from original data. Data from the csv files are used.

##### ***Libraries***

In [1]:
###################
# External source #
###################
# Include the modules
dirFunc = realpath("../src/")
include(dirFunc*"/ActStatData.jl") # include(dirFun*"/ActPlotData.jl"))

#############
# Libraries #
#############
using DataFrames, CSV, Main.ActStatData, Distributions, Dates

##### ***Select one sample to create a simulated dataset***

In [2]:
############
# CONSTANT #
############

# List of visit directories
listDir = ["../data/Baseline Visit Data/";
           "../data/32 Week Gestation Data/";
           "../data/6 Week PP Data/";
           "../data/6 Months PP Data/";
           "../data/12 Months PP Data/"]
numFolder = 1
numFiles = 2;

# data folder path
myDir = realpath(listDir[numFolder])
# get the list of files in the data directory myDir
(myData, myHeader) = ActStatData.filesNoNaN(myDir); 

# generate activity dataframe for one individual data set
dfAct = ActStatData.readActivity(joinpath(myDir,myData[numFiles]));
# generate bio dataframe for one individual data set
dfBio = ActStatData.readActivity(joinpath(myDir, myHeader[numFiles]));
# test

There is no missing (i.e. NA) data log in the following directory:
C:\git\senresearch\AccelerometerDataProcessing\data\Baseline Visit Data


In [13]:
first(dfAct, 3)

Unnamed: 0_level_0,Day,ElapsedSeconds,DateTime,ActivityCounts,Steps,EnergyExpenditure
Unnamed: 0_level_1,Int64,Int64,DateTime,Int64,Int64,Float64
1,1,180,2017-03-09T13:50:00,724,48,0.029
2,1,240,2017-03-09T13:51:00,971,7,0.033
3,1,300,2017-03-09T13:52:00,636,37,0.028


In [262]:
dfBio;

##### ***Function camouflaging bio dataset***

In [57]:
"""
`camouflageBio(df::DataFrame)`

camouflageBio(df::DataFrame) => DataFrame

Return fake bio dataframe. 

"""
function camouflageBio(df::DataFrame)
     
    dfBio = deepcopy(df)
    # Camouflage: age, height, weight, Device Serial Number, Start Date, and
    # Start Time
    # Age    
    dfBio[2, :Value] = string(parse(Int64, dfBio[2, :Value]) -  rand(1:5))
    # Height
    dfBio[4, :Value] = string(round(parse(Float64, dfBio[4, :Value]) + rand(Uniform(0.1, 3.1)), digits = 1))
    dfBio[5, :Value] = round(parse(Float64,dfBio[4, :Value])/2.54, digits= 1) |> string
    # Weight
    dfBio[6, :Value] = string(round(parse(Float64, dfBio[6, :Value]) - rand(Uniform(0.1, 3.1)), digits = 2))
    dfBio[7, :Value] = round(parse(Float64,dfBio[6, :Value])*2.2, digits= 1) |> string
    # Device Serial Number
    dfBio[10, :Value] = "XXX-XXX-X"
    # Start Date
    dfBio[8, :Value] = string(parse(Date, dfBio[8, :Value]) + Dates.Year(rand(3:5)) + Dates.Month(rand(1:6)) + 
                        Dates.Month(rand(1:15)))
    dfBio[8, :Unit] = string("(", string(Dates.dayname(parse(Date, dfBio[8, :Value])))[1:3], ")")
    # Start Time
    dfBio[9, :Value] = string(parse(Time, dfBio[9, :Value]) + Dates.Hour(rand(1:6)) + Dates.Minute(rand(1:30)) )
    
    return dfBio
end

camouflageBio

In [58]:
string("(", string(Dates.dayname(parse(Date, dfBio[8, :Value])))[1:3], ")")

"(Thu)"

##### ***Test `camouflageBio`***

In [218]:
dfBioSim = camouflageBio(dfBio)

Unnamed: 0_level_0,Setting,Value,Unit
Unnamed: 0_level_1,String?,String31,String31
1,Identity:,4,missing
2,Age:,22,years
3,Gender:,Female,missing
4,Height:,161.3,cm
5,missing,63.5,inches
6,Weight:,67.62,kg
7,missing,148.8,lbs
8,Start Date:,2022-03-09,(Wed)
9,Start Time:,20:03:00,missing
10,Device Serial Number:,XXX-XXX-X,missing


##### ***Function camouflaging activity dataset***

In [219]:
"""
`camouflageAct(df::DataFrame)`

camouflageAct(df::DataFrame) => DataFrame

Return fake activity dataframe. 

"""
function camouflageAct(df::DataFrame, startDate, startTime)
     
    dfAct = deepcopy(df)
    n = size(dfAct, 1)
    vRand = rand(Uniform(0.95, 1.05), n);
    δ(n) = n > 0;
    
    # Camouflage: DateTime, ActivityCounts, Steps, Device Serial Number, Start Date, and
    # Start Time
    # DateTime    
    dfAct.DateTime = parse(DateTime, string(startDate, "T", startTime)) .+ Dates.Minute.(collect(2:n+1))
    # ActivityCounts
    dfAct.ActivityCounts = round.(Int64, dfAct.ActivityCounts.*vRand)
    # Steps
    dfAct.Steps = round.(Int64, dfAct.Steps.*vRand)
    # EnergyExpenditure
    dfAct.EnergyExpenditure = round.(dfAct.EnergyExpenditure.*vRand, digits = 3);
    # ActivityIntensity
    dfAct.ActivityIntensity = trunc.(Int, δ.(dfAct.EnergyExpenditure) .+ δ.(dfAct.EnergyExpenditure .- 0.0309) 
                                     .+ δ.(dfAct.EnergyExpenditure .- 0.0829) .+ 1)
    return dfAct
end

camouflageAct

##### ***Test `camouflageAct`***

In [223]:
dfActSim = camouflageAct(dfAct, dfBioSim.Value[8], dfBioSim.Value[9]);
first(dfActSim, 3)

Unnamed: 0_level_0,Day,ElapsedSeconds,DateTime,ActivityCounts,Steps,EnergyExpenditure
Unnamed: 0_level_1,Int64,Int64,DateTime,Int64,Int64,Float64
1,1,180,2022-03-09T20:05:00,697,46,0.028
2,1,240,2022-03-09T20:06:00,964,7,0.033
3,1,300,2022-03-09T20:07:00,622,36,0.027


##### ***Script generate articifical data***

In [257]:
# Create directory `artificial` if it doesn't exist
if !isdir("../data/artificial")
   mkpath("../data/artificial")
end

# Select the first 100 files 
dirVisit = "../data/Baseline Visit Data/"

# Get the list of files in the data directory myDir
(myData, myHeader) = ActStatData.filesNoNaN(dirVisit);
myData = myData[1:100];
myHeader = myHeader[1:100];

for i in  1:100
    # Extract activity dataframe
    dfAct = ActStatData.readActivity(joinpath(dirVisit, myData[i]));
    # Extract bio dataframe
    dfBio = ActStatData.readActivity(joinpath(dirVisit, myHeader[i]));

    # Camouflage Bio
    dfBioSim = camouflageBio(dfBio)

    # Camouflage Activity
    dfActSim = camouflageAct(dfAct, dfBioSim.Value[8], dfBioSim.Value[9])
    
    # Save artificial dataset
    fileBio = joinpath("..","data","artificial",string(myData[i][1:3],"_hdr.csv"));
    dfBioSim |> CSV.write(fileBio);

    # Save artificial dataset
    fileAct = joinpath("..","data","artificial",string(myData[i][1:3],".csv"));
    dfActSim |> CSV.write(fileAct);
end



There is no missing (i.e. NA) data log in the following directory:
../data/Baseline Visit Data/


In [252]:
pwd()

"C:\\git\\senresearch\\AccelerometerDataProcessing\\notebooks"

In [236]:
 myHeader[i]

'4': ASCII/Unicode U+0034 (category Nd: Number, decimal digit)

In [237]:
# List of visit directories
listDir = ["../data/Baseline Visit Data/";
           "../data/32 Week Gestation Data/";
           "../data/6 Week PP Data/";
           "../data/6 Months PP Data/";
           "../data/12 Months PP Data/"]
numFolder = 1
numFiles = 2;

# data folder path
myDir = realpath(listDir[numFolder])
# get the list of files in the data directory myDir
(myData, myHeader) = ActStatData.filesNoNaN(myDir); 

# generate activity dataframe for one individual data set
dfAct = ActStatData.readActivity(joinpath(myDir,myData[numFiles]));
# generate bio dataframe for one individual data set
dfBio = ActStatData.readActivity(joinpath(myDir, myHeader[numFiles]));

There is no missing (i.e. NA) data log in the following directory:
C:\git\senresearch\AccelerometerDataProcessing\data\Baseline Visit Data


In [260]:
myHeader;