# Hello World

### Lesson Objectives

Upon completing this notebook you should be able to understand and apply the following concepts:

- How to set up your environment to use the Lucata toolchain to compile code.
- Understand the different Lucata tools including *emu-cc* and *emusim.x*.
- Be able to run a simple Hello World script that spawns Emu threads and then syncs the result.
- Run a simulation with timing that generates statistics.
    - Compare some basic statistics for a naive and "Lucata-aware" memory layout.

### Environment Setup

We first need to initialize our environment to use the Lucata toolchain. This toolchain allows you to compile Cilk code for x86, the Lucata simulator, and for hardware execution. Note that this notebook should load the toolchain using the included `.env` file, so this is just if you wanted to compile code on the command line.

In [11]:
#This command shows how to set up your environment for command-line execution by sourcing the .env file.  
#The ! indicates that this is a BASH command, and `set -x` prints out verbose output from the execution  

!set -x;. ../.env; set +x

+ . ../.env
++ LUCATA_VERSION=22.09-beta
++ export LUCATA_BASE=/tools/lucata/pathfinder-sw/22.09-beta
++ LUCATA_BASE=/tools/lucata/pathfinder-sw/22.09-beta
++ PATH=/tools/lucata/pathfinder-sw/22.09-beta/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
++ LD_LIBRARY_PATH=/tools/lucata/pathfinder-sw/22.09-beta/lib:/usr/lib64:/usr/lib/x86_64-linux-gnu/
++ USER=
++ echo 'Lucata tools are added to current path from /tools/lucata/pathfinder-sw/22.09-beta'
Lucata tools are added to current path from /tools/lucata/pathfinder-sw/22.09-beta
+ set +x


For this and other notebooks, we will import the following environment variables - a pointer to the user's notebook code directory and a pointer to the Lucata tools.

In [12]:
import os

#Set the path to the latest toolset 
LUCATA_BASE="/tools/emu/pathfinder-sw/22.09-beta" 

#Get the path to where all code samples are
os.environ["USER_NOTEBOOK_CODE"]=os.path.dirname(os.getcwd())
os.environ["PATH"]=os.pathsep.join([os.path.join(LUCATA_BASE,"bin"),os.environ["PATH"]])
os.environ["FLAGS"]="-I"+LUCATA_BASE+"/include/"+" -L"+LUCATA_BASE+"/lib -lmemoryweb"

Check the environment variables we set.

In [13]:
%%bash
printf "Your notebook code folder is at: $USER_NOTEBOOK_CODE\n\n"

printf "Execution path is at: $PATH\n\n"

#Print out the compiler flags we need to use the Lucata memoryweb headers and library
printf "Lucata compilation flags are '$FLAGS'\n\n"

#Print out which Lucata compiler, emu-cc, we are using
printf "Using the Lucata compiler, emu-cc, located at: "
which emu-cc


Your notebook code folder is at: /nethome/mnguyen383/lucata-pathfinder-tutorial/code

Execution path is at: /tools/emu/pathfinder-sw/22.09-beta/bin:/tools/emu/pathfinder-sw/22.09-beta/bin:/nethome/mnguyen383/bin:/nethome/mnguyen383/.local/bin:/opt/slurm/current/bin:/opt/verilator/bin:/opt/riscv-gnu-toolchain/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Lucata compilation flags are '-I/tools/emu/pathfinder-sw/22.09-beta/include/ -L/tools/emu/pathfinder-sw/22.09-beta/lib -lmemoryweb'

Using the Lucata compiler, emu-cc, located at: /tools/emu/pathfinder-sw/22.09-beta/bin/emu-cc


## Code Example 1 - Naive Hello World

Here is a "Hello, world" example to start showing aspects of writing for the Emu. However, your first question might be related to the use of the `mw_malloc1dlong` array with a distributed system.

Where does `ptr` itself live? Does computing `ptr[k]` cause a migration?

```c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <cilk.h>

// These are Emu-specific.
#include <memoryweb.h>
#include <timing.h>

static const char str[] = "Hello, world!";

long * ptr;
char * str_out;

int main (void)
{
     // long is the reliable word length, 64-bits.
     const long n = strlen (str) + 1;

     ptr = mw_malloc1dlong (n); // striped across the nodelets
     str_out = malloc (n * sizeof (char))); // entirely on the first nodelet

     /*
      * Start timing here.
      * Profiler settings hidden for simplicity.
      */

     for (long k = 0; k < n; ++k)
          ptr[k] = (long)str[k]; // Remote writes

     for (long k = 0; k < n; ++k)
          str_out[k] = (char)ptr[k]; // Migration and remote write...

     printf("%s\n", str_out);  // Migration back
     
     // Profiler end commands.
     
     return 0;
}
```



###  Compilation and simulation for the Pathfinder
We'll test compiling this example to show the syntax and then move on to a more optimized example. Note that the .mwx output can be used for simulation and execution on the Pathfinder system. 

We use `emu-cc` to compiler and `emusim.x` to run a System-C simulation of the application running on a Pathfinder with the specified memory and node paramters.

It is also important to understand the following details:  
* We defined $FLAGS up above to include the Lucata headers and libraries  
* `emusim.x` takes a few parameters including the memory size (-m 24 for 4 MB) and the number of nodes to simulate (--total_nodes <N>). You can try changing the memory size (24 to 38) or number of nodes (power-of-twp between 1 to 32) to see how the simulation changes.

In [14]:
%%bash
emu-cc -o hello-world-naive.mwx $FLAGS hello-world-naive.c
ls *.mwx

hello-world-naive.mwx


/net/tools/emu/pathfinder-sw/22.09-beta/bin/clang: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/clang)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/clang-6.0: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/clang-6.0)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/opt: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/opt)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/opt: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/opt)


In [15]:
%%bash
#Run a basic simulation with memory size 2^24, one node, and the naive Hello World executable as input
emusim.x -m 24 --total_nodes 1 -- hello-world-naive.mwx

Start untimed simulation with local date and time= Thu Nov 17 12:37:34 2022

Timed simulation starting...
Hello, world!
End untimed simulation with local date and time= Thu Nov 17 12:37:37 2022


Info: /OSCI/SystemC: Simulation stopped by user.



        SystemC 2.3.3-Accellera --- Sep  7 2022 09:15:59
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED


## Code Example 2 - Hello World with Replication

With the Lucata architecture, we often want to avoid spurious migrations by replicating data across nodes so that each node has a copy of the relevant data it needs. This improved sample in `hello-world/hello-world.c`, demonstrates the usage of the `replicated` type:

```c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <cilk.h>

// These are Emu-specific.
#include <memoryweb.h>
#include <timing.h>

static const char str[] = "Hello, world!";

replicated long * ptr;
replicated char * str_out;

int main (void)
{
     // long is the reliable word length, 64-bits.
     const long n = strlen (str) + 1;

     // Allocating a copy of data on each nodelet typically reduces migrations for commonly accessed elements 
     mw_replicated_init ((long*)&ptr, (long)mw_malloc1dlong (n));
     mw_replicated_init ((long*)&str_out, (long)malloc (n * sizeof (char)));
     
    
     /*
      * Start timing here.
      * Profiler settings hidden for simplicity.
      */
    
     for (long k = 0; k < n; ++k)
          ptr[k] = (long)str[k]; // Remote writes

     for (long k = 0; k < n; ++k)
          str_out[k] = (char)ptr[k]; // Migration and remote write

     printf("%s\n", str_out);  // Migration back
    
     // Profiler end commands.
    
     return 0;
}
``` 

### Compiling and Simulating Hello World with Replication

Here we show how to compile the "Lucata-aware" hello world example and to run it with the simulator. 

In [16]:
%%bash
emu-cc -o hello-world.mwx $FLAGS hello-world.c
ls *.mwx

hello-world.mwx
hello-world-naive.mwx


/net/tools/emu/pathfinder-sw/22.09-beta/bin/clang: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/clang)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/clang-6.0: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/clang-6.0)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/opt: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/opt)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/opt: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/opt)


In [17]:
%%bash
emusim.x -m 24 --total_nodes 1 -- hello-world.mwx

Start untimed simulation with local date and time= Thu Nov 17 12:38:27 2022

Timed simulation starting...
Hello, world!
End untimed simulation with local date and time= Thu Nov 17 12:38:31 2022


Info: /OSCI/SystemC: Simulation stopped by user.



        SystemC 2.3.3-Accellera --- Sep  7 2022 09:15:59
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED


## Hello World Spawn Example

That example kept one thread alive and migrating between nodelets.  This next example, hello-world-spawn.c, uses Cilk's thread spawning intrinsic along with replicated memory.

```c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <cilk.h>

#include <memoryweb.h>
#include <timing.h>

const char str[] = "Hello, world!";

static inline void copy_ptr (char *pc, const long *pl) { *pc = (char)*pl; }

replicated long * ptr;
replicated char * str_out;

int main (void)
{
     long n = strlen (str) + 1;

     mw_replicated_init ((long*)&ptr, (long)mw_malloc1dlong (n));
     mw_replicated_init ((long*)&str_out, (long)malloc (n * sizeof (char)));

     /*
      * Start timing here.
      * Profiler settings hidden for simplicity
      */

     for (long k = 0; k < n; ++k)
          ptr[k] = (long)str[k]; // Remote writes

     for (long k = 0; k < n; ++k)
          cilk_spawn copy_ptr (&str_out[k], &ptr[k]);

     cilk_sync;
    
     printf("%s\n", str_out);  // Migration back
     
     // Profiler end commands.
    
     return 0;
}
```

In [19]:
%%bash

#Compile the code
emu-cc -o hello-world-spawn.mwx $FLAGS hello-world-spawn.c
#Note that we are simulating this with at least 4 nodes! This should give us different statistics than the previous examples
emusim.x -m 24 --total_nodes 4 -- hello-world-spawn.mwx
#Then we can print out all the output files that were generated.
ls hello-world-spawn.*

Start untimed simulation with local date and time= Thu Nov 17 12:41:13 2022

Hello, world!
End untimed simulation with local date and time= Thu Nov 17 12:41:13 2022

hello-world-spawn.c
hello-world-spawn.cdc
hello-world-spawn.mwx


/net/tools/emu/pathfinder-sw/22.09-beta/bin/clang: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/clang)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/clang-6.0: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/clang-6.0)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/opt: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/opt)
/net/tools/emu/pathfinder-sw/22.09-beta/bin/opt: /usr/lib64/libtinfo.so.5: no version information available (required by /net/tools/emu/pathfinder-sw/22.09-beta/bin/opt)

        SystemC 2.3.3-Accellera --- Sep  7 2022 09:15:59
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED


### Simple Comparison  
Then we can compare the output of the normal Hello World and the Spawn Hello World for the statistics that are different. The files labeled `*.cdc` have some basic statistics about the simulated system that will change with memory allocation type (naive or replicated) and system size (1 to 8 nodes).

In [16]:
%%bash
#Print out all the .cdc files we generated
ls *.cdc

hello-world.cdc
hello-world-naive.cdc
hello-world-spawn.cdc


In [17]:
%%bash
less hello-world-spawn.cdc

************************************************
Program Name/Arguments: 
hello-world-spawn.mwx 
************************************************
Simulator Version: 22.8.31-lubase
************************************************
Configuration Details:
Ring Model = Stratix: 3 GC Clusters, 8 MSPs
Number of Nodes=4
Total Memory (in MiB)=64
Logical MSPs per Node=1
Log2 Memory Size per MSP=24
GC Clusters per Node=3
GCs per Cluster=8
************************************************
Emu system run time 0.000656 sec==655608000 ps
System thread counts:
	active=0, created=14, died=14,
	max live=3 first occurred @58314168 ps with prog 8.89% complete
	and last occurred @58314168 ps with prog 12.8% complete
************************************************
************************************************
Simulator wall clock time (seconds): 24


Note what changes in this file between the normal "replicated" Hello World and the "replicated+spawn" version of the code.  
* We simulate with a different number of nodes so we used a different amount of memory
* The larger simulation takes a bit longer to run and shows different statistics for the `active` threads and progession.

In [18]:
%%bash
! diff hello-world.cdc hello-world-spawn.cdc

3c3
< hello-world.mwx 
---
> hello-world-spawn.mwx 
9,10c9,10
< Number of Nodes=1
< Total Memory (in MiB)=16
---
> Number of Nodes=4
> Total Memory (in MiB)=64
16c16
< Emu system run time 0.000606 sec==605604000 ps
---
> Emu system run time 0.000656 sec==655608000 ps
18,20c18,20
< 	active=0, created=0, died=0,
< 	max live=0 first occurred @0 s with prog 0% complete
< 	and last occurred @0 s with prog 0% complete
---
> 	active=0, created=14, died=14,
> 	max live=3 first occurred @58314168 ps with prog 8.89% complete
> 	and last occurred @58314168 ps with prog 12.8% complete
23c23
< Simulator wall clock time (seconds): 4
---
> Simulator wall clock time (seconds): 24


### Cleanup  
Finally we can clean up our code directory and the output files using the included Makefile in this directory. Uncomment this line to clean the working directory.

In [10]:
!make clean

rm -f *.mwx *.tqd *.cdc *.vsf *.mps *.uis *.csv *.hpc; \
./helpers/backup_imgs.sh


### Exercises
To further your understanding of this topic we encourage you to try the following:  
1) Restart the notebook and change the memory size and numbers of nodes that are simulated. How do the statistics change?  
2) Investigate the other output files like the `.mps` file and understand how they are different for different applications.  
More details on visualizing these files are in the next Notebook.