In [1]:
use Dan :ALL;  #~20 sec on first run

### A regular Dan DataFrame with Index

In [2]:
my $dfRand = DataFrame.new( [[rand xx 5] xx 6], columns => <A B C D E>, index => ['a'..'zzz'] );
~$dfRand;

    A                     B                    C                    D                    E                   
 a  0.4866142040249799    0.7363155572749766   0.5288816103734105   0.8350591532040785   0.546363563089522   
 b  0.7936416052169518    0.9226721118031563   0.46524562452747487  0.2940576630417491   0.37513752016941826 
 c  0.042935923351063554  0.44247578262843124  0.9057480210943761   0.19047867970980958  0.6366627576105072  
 d  0.07264867658177565   0.1230300329329237   0.8706552608672291   0.09366807143396572  0.3194418866369073  
 e  0.8864118597835392    0.2369093002965429   0.3374878632056312   0.7826715276731341   0.7137242536931894  
 f  0.8747840939846101    0.10007796754266307  0.8830142801234392   0.6756487979640361   0.8206295942472142  

In [3]:
$dfRand[3]{'C'..'E'};

(0.8706552608672291 0.09366807143396572 0.3194418866369073)

In [4]:
$dfRand<d>{'C'..'E'};

(0.8706552608672291 0.09366807143396572 0.3194418866369073)

### Dan::DataFrame and Data::Generators I

Lets mix in https://github.com/antononcube/Raku-Data-Generators and https://github.com/antononcube/Raku-Data-Reshapers
per the Reddit chat https://www.reddit.com/r/rakulang/comments/vj27i5/raku_dan_tprc_talk/

For now this is just a Data::... only example...

In [5]:
use Data::Generators;  #~20 sec on first run
use Data::Reshapers;

In [6]:
# Make a dataset
my @dsRand = random-tabular-dataset(6, <A B C D E>);
dd @dsRand;

Array @dsRand = [{:A("inconspicuously"), :B(14.839455599719894e0), :C(19.396767406247484e0), :D("Tchaikovsky"), :E("0IRmpUtPG")}, {:A("blurt"), :B(9.392448221864141e0), :C(18.519524932859788e0), :D("gossiping"), :E("ooPgSB")}, {:A("Penobscot"), :B(13.79602322026288e0), :C(23.34156489930512e0), :D("laughing"), :E("AGwd")}, {:A("swan-neck"), :B(4.553218901024791e0), :C(18.500570771409983e0), :D("kangaroo"), :E("9bpfoiYGekI4qOcmP7")}, {:A("arbitrable"), :B(17.374799210500562e0), :C(12.36901055969075e0), :D("slim"), :E("kzww")}, {:A("dichromia"), :B(-3.795956656387391e0), :C(5.78374358610828e0), :D("postoperative"), :E("ZBVfiCql")}]


This shows the raw dataset implied in Data::Generators - it is just an Array of Hashes of Pairs.

Remark: The signature design and implementation are based on the Mathematica implementation RandomTabularDataset, [AAf3].
https://reference.wolfram.com/language/ref/Dataset.html

In [7]:
# Show first dataset
to-pretty-table(@dsRand, field-names => <A B C D E>);

+-----------------+-----------+-----------+---------------+--------------------+
|        A        |     B     |     C     |       D       |         E          |
+-----------------+-----------+-----------+---------------+--------------------+
| inconspicuously | 14.839456 | 19.396767 |  Tchaikovsky  |     0IRmpUtPG      |
|      blurt      |  9.392448 | 18.519525 |   gossiping   |       ooPgSB       |
|    Penobscot    | 13.796023 | 23.341565 |    laughing   |        AGwd        |
|    swan-neck    |  4.553219 | 18.500571 |    kangaroo   | 9bpfoiYGekI4qOcmP7 |
|    arbitrable   | 17.374799 | 12.369011 |      slim     |        kzww        |
|    dichromia    | -3.795957 |  5.783744 | postoperative |      ZBVfiCql      |
+-----------------+-----------+-----------+---------------+--------------------+

In [8]:
# Endow the dataset with row-names
my %dsRand = 'a'..'f' Z=> @dsRand;

# Show second dataset
to-pretty-table(%dsRand, field-names => <A B C D E>);

+---+-----------------+-----------+-----------+---------------+--------------------+
|   |        A        |     B     |     C     |       D       |         E          |
+---+-----------------+-----------+-----------+---------------+--------------------+
| a | inconspicuously | 14.839456 | 19.396767 |  Tchaikovsky  |     0IRmpUtPG      |
| b |      blurt      |  9.392448 | 18.519525 |   gossiping   |       ooPgSB       |
| c |    Penobscot    | 13.796023 | 23.341565 |    laughing   |        AGwd        |
| d |    swan-neck    |  4.553219 | 18.500571 |    kangaroo   | 9bpfoiYGekI4qOcmP7 |
| e |    arbitrable   | 17.374799 | 12.369011 |      slim     |        kzww        |
| f |    dichromia    | -3.795957 |  5.783744 | postoperative |      ZBVfiCql      |
+---+-----------------+-----------+-----------+---------------+--------------------+

### Dan::DataFrame and Data::Generators II

Now let's try making a Dan::DataFrame with Data::Generators...

In [9]:
my @tbl = get-titanic-dataset();
my $res = cross-tabulate( @tbl, 'passengerSex', 'passengerClass');

{female => {1st => 144, 2nd => 106, 3rd => 216}, male => {1st => 179, 2nd => 171, 3rd => 493}}

In [10]:
my @colnames = <A B C D E>;
my @dsRand2 = random-tabular-dataset(6, @colnames);
to-pretty-table(@dsRand2);

+--------------+----------------+-----------+---------------+-----------+
|      C       |       E        |     A     |       D       |     B     |
+--------------+----------------+-----------+---------------+-----------+
|    unload    |   angledozer   |   crater  | antispasmodic | 15.418379 |
| Chickamauga  |   tax-exempt   |   yo-yo   |  Bartlesville | 14.762477 |
|   drinker    |   gloatingly   | pseudopod |  hyperidrosis |  9.948859 |
|    taster    | porcupinefish  | homewards |  wollastonite |  1.536198 |
| Barranquilla | platitudinize  |   Exacum  |    overpay    | 11.788930 |
|    Carew     | foster-brother | psoriasis |      bill     | 21.082479 |
+--------------+----------------+-----------+---------------+-----------+

Remark: the dataset model does not inherently preserve order

In [11]:
@dsRand2

[{A => crater, B => 15.418379091678371, C => unload, D => antispasmodic, E => angledozer} {A => yo-yo, B => 14.762476868666774, C => Chickamauga, D => Bartlesville, E => tax-exempt} {A => pseudopod, B => 9.948859469216888, C => drinker, D => hyperidrosis, E => gloatingly} {A => homewards, B => 1.5361975003623574, C => taster, D => wollastonite, E => porcupinefish} {A => Exacum, B => 11.788929649546072, C => Barranquilla, D => overpay, E => platitudinize} {A => psoriasis, B => 21.082478592526073, C => Carew, D => bill, E => foster-brother}]

Need a little routine to unpack dataset to array of pairs of arrays...

In [16]:
sub unpack( @dataset, @colnames ) {
    my ( %hoa, @aop );

    for @colnames -> $cn {
        for @dataset -> %rh {
            %hoa{$cn}.push: %rh{$cn}
        }
        @aop.push: $cn => %hoa{$cn}
    }
    @aop
}

&unpack

In [17]:
my $danRand = DataFrame.new( data => @dsRand2.&unpack( @colnames ) );
$danRand.ix: 'a'..'zz';
~$danRand;

    A          B                   C             D              E              
 a  crater     15.418379091678371  unload        antispasmodic  angledozer     
 b  yo-yo      14.762476868666774  Chickamauga   Bartlesville   tax-exempt     
 c  pseudopod  9.948859469216888   drinker       hyperidrosis   gloatingly     
 d  homewards  1.5361975003623574  taster        wollastonite   porcupinefish  
 e  Exacum     11.788929649546072  Barranquilla  overpay        platitudinize  
 f  psoriasis  21.082478592526073  Carew         bill           foster-brother 

### How a 'no-index' Dan::DataFrame might look

In [None]:
my $row-index = Series.new( name => 'rix', ['a'..'f'] );
~$row-index;

In [None]:
my $nxRand = DataFrame.new( [[rand xx 5] xx 6], columns => <A B C D E> );
~$nxRand;

In [None]:
$nxRand.splice( :ax, 0, 0, [$row-index,] );
~$nxRand;

In [None]:
$nxRand.series('rix')[3];

In [None]:
#which would mean that we need some monster "accessor" - yikes!!
~$nxRand.grep({$nxRand.series('rix')[$++] eq 'd'})[*]{'C'..'E'};

Conclusion: We like have row indexes...