In [1]:
# takes ~20sec
use Dan :ALL;
use Text::CSV;

In [2]:
my @lines = csv( in => 'DemographicData.csv' );

my \stats = DataFrame.new( columns => @lines[0], data => @lines[1..*-1] );
~stats[0..5]^;         # head

    Country Name          Country Code  Birth rate  Internet users  Income Group        
 0  Aruba                 ABW           10.244      78.9            High income         
 1  Afghanistan           AFG           35.253      5.9             Low income          
 2  Angola                AGO           45.985      19.1            Upper middle income 
 3  Albania               ALB           12.877      57.2            Upper middle income 
 4  United Arab Emirates  ARE           11.044      88              High income         
 5  Argentina             ARG           17.716      59.9            High income         

Now I have my data loaded, I want to clean it up prior to analyzing. Step 1 is to relabel the columns.

In [3]:
# columns is a Hash (you can make one with Zip & Pair operators and a Range, like this...)

stats.columns = <CountryName CountryCode BirthRate InternetUsers IncomeGroup> Z=> 0..∞;

{BirthRate => 2, CountryCode => 1, CountryName => 0, IncomeGroup => 4, InternetUsers => 3}

In [4]:
stats.dtypes;

CountryName => Str
CountryCode => Str
BirthRate => Str
InternetUsers => Str
IncomeGroup => Str

Oh shoot, it's a bunch of Strings - why didn't the numbers in BirthRate and InternetUsers show up as numbers with type (Int) or (Num)?

Well in raku you have control - the default is to store text as, well text (Str)s - you can still go ahead and use Str types as numbers in raku - they are coerced when the math operation is performed.

In [5]:
# here the [+] reduce operation sums all the BirthRates - just works even though the operands are type Str:
[+] stats[*]<BirthRate>

4186.636

No types may be fine for you - it keeps the original format, avoids unnecessary parsing and "just works"!

BUT - you don't want to get to a point down the road - perhaps after you have sent your results to a colleague - when one of your data entries will not convert due to a transposition error (something like '26λ4' maybe). Also, your machine may be more efficient in storing or processing a number representation.

SO - how can you check the Type of your data when you collect it and coerce it into line?

Step 2 is to cleanse the data by coercing the (Str) values to (Rat)s. (If you didn't hear abour Rats before, then you will be amazed at the richness of raku's [numeric](https://docs.raku.org/language/numerics) Types)

In [6]:
# here we use the .Rat method on each data element in cols 2,3 to coerce it to a Rat:

stats.data[*;2,3].map({$_.=Rat});
stats[*][2].are;

(Rat)

So that's cool, raku is gradually typed, so I can use Types where and when I need them but otherwise they stay out of the way.

Here's how to control raku Dan Types so that they are used to enforce behaviours and constraints on the data as it is loaded into a typed DataFrame:

In [8]:
# first define the some Typed Series classes:

class RatSeries {
    has Series $.series handles *;

    method TWEAK { 
        given $.data {
            unless ( .all ~~ Rat ) { 
                die "Data fails to meet {$.self.^name} constraint." 
            }   
        }   
    }   
}
my $rse = RatSeries.new(series => stats[*]<BirthRate>);

class StrSeries {
    has Series $.series handles *;

    method TWEAK { 
        given $.data {
            unless ( .all ~~ Str ) { 
                die "Data fails to meet {$.self.^name} constraint." 
            }   
        }   
    }   
}
my $sse = StrSeries.new(series => stats[*]<CountryName>);
$sse ~~ StrSeries;

Redeclaration of symbol 'RatSeries'.

Then make a DataFrame custom type that checks all the cols:

In [9]:
class DemoDataFrame {
    has DataFrame $.dataframe handles *;

    method TWEAK {
        unless ( 1
            && StrSeries.new(series => self.dataframe[*]<CountryName>)
            && StrSeries.new(series => self.dataframe[*]<CountryCode>)
            && RatSeries.new(series => self.dataframe[*]<BirthRate>)
            && RatSeries.new(series => self.dataframe[*]<InternetUsers>)
            && StrSeries.new(series => self.dataframe[*]<IncomeGroup>)
        ) {
            die "Data fails to meet {$.self.^name} constraint."
        }
    }
}

# now you can use the custom type to control what goes in a variable:
my DemoDataFrame $ddf .= new(dataframe => stats);
$ddf ~~ DemoDataFrame;

True

In [11]:
# or try this as a function signature:

sub print-ddf( DemoDataFrame $x ) {
    say $x.dtypes
}

print-ddf($ddf);

CountryName => Str
CountryCode => Str
BirthRate => Rat
InternetUsers => Rat
IncomeGroup => Str
