-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AsyncBril #305
base: main
Are you sure you want to change the base?
AsyncBril #305
Conversation
Hey! This is a really cool project! I'm glad that overall, the rust code base was fairly extensible for supporting your extension. I created a pr, #306, to help address one of your failing ci checks. I think the other is an easy If you don't mind, I have a few suggestions/questions about your extension? It would be helpful if your extension was testable with |
@@ -286,6 +294,7 @@ fn type_check_instruction<'a>( | |||
check_asmt_type(ty, &expected_arg.arg_type) | |||
})?; | |||
|
|||
// print to console |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear?
check_num_funcs(0, funcs)?; | ||
check_num_labels(0, labels)?; | ||
check_asmt_type(&Type::Int, get_type(env, 0, args)?)?; | ||
update_env(env, dest, op_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot to check the op_type here as is done in all the other cases?
@@ -400,7 +497,11 @@ fn type_check_instruction<'a>( | |||
Some(t) => { | |||
check_num_args(1, args)?; | |||
let ty0 = get_type(env, 0, args)?; | |||
check_asmt_type(t, ty0) | |||
if let Some(promise_t) = promise { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me why the promise
variable needs to be threaded through as an argument to type_check_instruction
. Isn't this information available inside of t
here? If so, that would be simpler
use fxhash::FxHashMap; | ||
use std::collections::HashMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FxHashMap should just be a drop in replacement no?
@@ -6,6 +6,7 @@ use std::fs::File; | |||
use std::io::Read; | |||
|
|||
fn main() { | |||
std::env::set_var("RUST_BACKTRACE", "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this can be set in the command line without needing to modify the code.
@@ -0,0 +1,81 @@ | |||
Asynchronous Bril |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this extra file intended?
@@ -201,12 +250,17 @@ impl fmt::Display for Value { | |||
Self::Float(v) => write!(f, "{v:.17}"), | |||
Self::Char(c) => write!(f, "{c}"), | |||
Self::Pointer(p) => write!(f, "{p:?}"), | |||
Self::Atomic(a) => write!(f, "atomic_{a}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the memory docs, they specify that the result of printing a pointer is unspecified(an implementation can do anything except error). Is that the same here?
out: &Mutex<T>, | ||
val: &Value, | ||
) -> Result<(), std::io::Error> { | ||
let mut out = out.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a nit, but there isn't really a reason to change this function right? Can we acquire the lock first and then call this function as a separation of concerns? That is equivalent to what you need to do if you are using Display instead.
This is a fine design decision but at the risk of overstepping, I think this is unnecessarily restrictive. Right now, promises need to get cleaned up at the end of each function call, but intuitively, there isn't any reason they can't span function boundaries right? Another approach is for these promises to have similar semantics as memory, everything needs to be cleaned up by the time the program terminates. |
@@ -87,17 +91,61 @@ impl Environment { | |||
} | |||
} | |||
|
|||
//A second heap for atomics, since AtomicI64 cannot be added to the Value enum without a lot of work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purely out of curiosity, did you consider making them a part of the Value enum more seriously? Let's say that Value didn't implement Copy and you modeled these atomics as a variant of Arc<AtomicI64>
instead of this map, would that be equivalent? What would the implications of that design decision be?
state.env.set(dest, Value::Int(res)); | ||
} | ||
LoadAtomic => { | ||
let arg0 = get_arg::<usize>(&state.env, 0, args) as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary as usize
cast. I think clippy has some suggestions that could be applied.
@@ -172,10 +221,10 @@ enum Value { | |||
Float(f64), | |||
Char(char), | |||
Pointer(Pointer), | |||
Atomic(usize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might consider using the "NewType" idiom here to not mix this usize with the usize used for variable names.
op: bril_rs::EffectOps, | ||
args: &[usize], | ||
funcs: &[usize], | ||
curr_block: &BasicBlock, | ||
// There are two output variables where values are stored to effect the loop execution. | ||
next_block_idx: &mut Option<usize>, | ||
result: &mut Option<Value>, | ||
out: Arc<Mutex<T>>, | ||
stop_flag: Option<Arc<AtomicBool>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stop_flag is somewhat mysterious to me. Why do we need to do this when the program ends up panic-ing anyways?
I think this could be implemented as &Option<AtomicBool>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this fit into State?
@@ -745,6 +926,8 @@ fn parse_args( | |||
Ok(()) | |||
} | |||
bril_rs::Type::Pointer(..) => unreachable!(), | |||
bril_rs::Type::Promise(..) => unreachable!("line 839"), | |||
bril_rs::Type::AtomicInt => unreachable!("line 840"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these line numbers in reference to?
Hmmm, this seems scary. I'm concerned that it's not just thread-safety but that there is is also a possibility of memory corruption. Did you try profiling this to see what the penalty is? One observation, pointers can't be used to write to different memory blocks so you actually might be able to write to multiple memory regions simultaneously without contention. I guess you only need to take a whole heap write lock during alloc/free? Maybe something smart can be done here... memory allocation is by far brilirs's least optimized part. |
@@ -516,19 +622,58 @@ fn execute_value_op<T: std::io::Write>( | |||
let res = Value::Pointer(arg0.add(arg1)); | |||
state.env.set(dest, res); | |||
} | |||
Resolve => { | |||
let res = match handlers.remove(&args[0]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first saw this line, I thought it was a bug since you are the variable name(usize'd index) to find the handler which also happens to be a usize. I realize looking at the rest of this code that it probably isn't.
Still, this seems like a serious hit to modularity, each promise is tied to the variable name it is bound to so you can't call id
on a promise, pass it as argument, or store it in an array. I think this should be documented. For what it's worth, I think this is a straight forward limitation to lift, though related to your scoping restriction somewhat.
Resolve => { | ||
let res = match handlers.remove(&args[0]) { | ||
Some(handle) => handle.0.join().unwrap(), | ||
None => panic!("Unexpected execution error"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use an result error instead? With a more descriptive error message.
|
||
let val = match result { | ||
Some(val) => val, | ||
None => Value::Uninitialized, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This result was originally an unwrap which I think better aligns with the program semantics. Typechecking should ensure that there is always a value here so it is not possible to create an Uninitialized value here
Sorry for all the comments, hopefully some of them were helpful but otherwise take them all with a grain of salt. Your team thought about this a lot more than I have and it's your project. 🦀 |
Very cool implementation, everybody! @Pat-Lafon has some good advice here, if you're interested in continuing the hacking. |
AsyncBril: Enhancing Bril with Asynchronous Programming and Thread-Style Features
By Andy He, Emily Wang, and Evan Williams
Overview and Motivation
In the final lecture of CS 6120, we talked about concurrency and parallelism and
the challenges they raise for compiler developers in the modern era of
computing. In particular, we examined how the problem of semantics is
inherently embedded in shared-memory multithreading and came across a key idea
by Boehm: threads cannot be implemented as a library.
The end of Moore's law indicates a need for parallelism to sustain advances
in computing. Thus, our goal in this project was to implement asynchronous
programming features for Bril. Modern programming languages rely heavily on
asynchronous programming and multithreading to optimize performance and manage
complex tasks. We thought the best way to learn about how these features are
implemented in real languages was to try it for ourselves. Further, Bril's
easily extensible nature makes it ideal for experimentation. We thought that
future CS 6120 students would benefit from being able to use our library to
implement their own synchronization primitives and other asynchronous
programming features using our extension.
Design and Implementation
In this projet, we added two main features to Bril: promises and atomic
primitive types (i.e.
atomicint
). Promises are a fundamentalpart of asynchronous programming. They allow the language to handle operations
that might take an unknown amount of time to complete (like I/O operations)
without blocking the execution of the rest of the program. Atomic operations are
crucial for safe concurrency. They ensure that operations on shared data are
completed as indivisible steps, preventing race conditions where multiple
threads might try to modify the same data simultaneously.
To implement these new language features we directly modified the Rust
interpreter for Bril:
brilirs
. We chose to implement the feature in Rust fora few reasons. First, the three of us are more experienced with Rust than we
are with TypeScript, so it seemed the natural choice for us. Second, we wanted
to take advantage of Rust's safe-concurrency features while implementing the
asynchronous extensions, since concurrency can be tricky to reason about.
We did not want to make major changes to the structure of
brilirs
, and wantedour changes to be as modular as possible, as to minimize the changes needed to extend
brilirs
in the future. A major goal of ours while designing and implementing this extensionwas that we wanted to ensure that future contributors did not have to understand our work to
build features that do not require asynchronous programming. For instance, any type extension
in Bril will automatically be supported by AsyncBril.
The heap is thread-unsafe, allowing multiple threads to concurrently read/write
to the heap without checking. This is done using Rust
unsafe
blocks.Alternatively, we could've used a
Rw<T>
to lock/unlock the heap when threadswere modifying it, but we ultimately decided that it would be more performant
to not implement it this way and provide support for synchronization primitives,
allowing for more flexibility.
Although the interpreter's heap is not thread safe, calls to
alloc
are stillmade atomically, so no thread can accidentally be assigned the same index
in
state.heap.memory
whenalloc
is called simultaneously between threads.Promises
We have introduced two critical syntax features to Bril:
Promise<T>
andResolve
, significantly enhancing its capabilities in threaded execution andasynchronous programming. The core of this implementation lies in how threads
are created and managed. Upon encountering a function defined with a
Promise<T>
return type, a thread is immediately started. This allows forconcurrent function execution, enabling the language to handle complex,
asynchronous tasks more effectively. These threads have local stack and global
heap access, ensuring independent operation without interfering with each
other's state. However, they also access a global heap, which is crucial for
modifying pointers shared across threads.
Here's an example of how our notion of promises is implemented in Bril:
This Bril program exemplifies the newly implemented asynchronous features in
the language, particularly demonstrating the use of
Promise<T>
and theResolve
mechanism. In the@main
function, two promises are created:prom
and
prom2
. Theprom
is a promise of an integer (promise<int>
) and isassigned the result of calling the
@mod
function with an integer argument(
size
). Similarly,prom2
is a promise of a boolean (promise<bool>
) and islinked to the outcome of the
@mod2
function, which is called with aboolean argument (
size2
).These promises, once initialized, signify the start of asynchronous operations.
The
@mod
and@mod2
functions, defined with return typespromise<int>
andpromise<bool>
respectively, indicate that the function will be executeasynchronously. Each function simply returns the passed argument, but in a
real-world scenario, these could represent more complex operations executed in
separate threads.
The use of the resolve keyword in the
@main
function is critical.resolve
is blocking and waits for the promise to be resolved to a value, synchronizes
the asynchronous function calls. The variables
x
andy
are assigned theresolved values of
prom
andprom2
, respectively. The print statements at theend of the
@main
function display these resolved values, showcasing howasynchronous tasks can be integrated into the main flow of a Bril program.
The
Resolve
feature, synchronized with the promise mechanism, is instrumentalin managing the life cycle of asynchronous operations. It allows threads to
signal the completion of tasks and communicate results back to the main
execution flow.
Unresolved promises are automatically cleaned up when they go out of scope. This
is implemented through an atomic flag which indicates to all sub-threads to clean
up all resources used and abort. However, as this may cause undefined behavior
when interacting with the global heap, a warning is printed to indicate that a
promise was left unresolved. In general, all promises should be resolved.
The introduction of these asynchronous capabilities necessitated a medium-sized
refactor of the
brilirs
interpreter, including parameterizing the types ofState to accommodate the new features. A significant enhancement is the use of
atomics in the Rust interpreter. The atomics ensure a consistent and thread-safe
mechanism to increment the dynamic count across all threads, which is crucial
for tracking and managing active threads. Overall, this implementation not only
enriches Bril with robust asynchronous programming and safe concurrency
management but also significantly extends its utility as a tool for learning
and experimenting with compiler design and parallel computing concepts.
Atomics
In the context of Bril, atomic primitives provide a way to perform certain
operations on integers in a way that is guaranteed to be atomic, meaning that
the operations are indivisible. This atomicity is crucial in multi-threaded
environments to prevent race conditions. Race conditions occur when two or more
threads access shared data and try to change it simultaneously. If one thread's
operations interleave with another's in an uncontrolled manner, it can lead to
inconsistent or unpredictable results. Atomics prevent this by ensuring that
operations like incrementing a value, setting it, or comparing and swapping it
happen as a single, uninterruptible step.
A particularly powerful application of atomic primitives is in implementing
mutexes (mutual exclusion locks). Mutexes are a foundational concept in
concurrent programming, used to prevent multiple threads from accessing a
critical section of code simultaneously. By using atomic primitives to implement
mutexes, you can ensure that only one thread can enter a critical section at a
time, thereby protecting shared resources from concurrent access.
In a typical mutex implementation using atomics, an atomic variable is used as
a lock flag. Before a thread enters a critical section, it uses atomic
operations to check and set this flag. If the flag indicates that the mutex
is already locked, the thread will wait (or "spin") until the flag changes.
This is often implemented using a compare-and-swap atomic operation, where the
flag is only set if it is currently in the expected unlocked state. Once the
thread has finished executing the critical section, it uses another atomic
operation to set the lock flag back to its unlocked state.
Here's an example of how atomics work in Bril:
In our Bril program, we demonstrate the use of atomic operations to manage a
shared atomic integer. The program starts by initializing a regular integer
one
with a value of 1, which is then used to create an atomic integersize
through the
newatomic
operation. The utilization of atomic integers is crucialin concurrent programming environments, as it ensures that the integer can be
safely read and modified by multiple threads without causing race conditions.
We then perform a compare-and-swap (CAS) operation on the atomic integer
size
.This atomic action is essential for achieving synchronization in multi-threaded
programs. The
cas
operation attempts to updatesize
from its expected value(
expect
, which is 1) to a new value (upd
, which is 2). The success of thisoperation, stored in
res
, hinges on whether the current value ofsize
matches the expected value, ensuring that the update occurs only if
size
hasnot been altered by another thread. We print the result of this operation to
reflect its success or failure.
Furthermore, we employ the
loadatomic
operation to safely read the currentvalue of
size
intores1
, ensuring a consistent and uncorrupted value.Following this, we use the
swapatomic
operation, which atomically swapssize
's value withthree
(3). The previous value ofsize
is captured inres2
. This swap operation is particularly useful for updating a sharedvariable while simultaneously retrieving its old value. Another
loadatomic
operation follows, storing the post-swap value of
size
inres3
.Finally, our program concludes by printing the results of these operations
(
res
,res1
,res2
, andres3
). This output showcases the effects ofatomic operations on the shared atomic integer, providing a clear example of
how atomics can be used to manage shared state in a concurrent environment
without the need for complex and potentially cumbersome lock-based
synchronization mechanisms.
Elaborating more on how our atomics can implement a mutex lock and allow threads
to concurrently modify an array, here is another useful code snippet:
The
@acquire_lock
function tries to acquire a lock on a shared resourcerepresented by
lock
, an atomic integer. The function first loads the currentvalue of
lock
atomically, and then loops to keep retrying to acquire the lockif it's already being used by another thread. To attempt to acquire the lock, a
Compare-and-Swap (CAS) operation is used.
The
@release_lock
function releases the lock. It sets the lock to0
usingan atomic swap operation, indicating that the lock is free. The use of atomic
operations ensures that these lock acquire and release operations are
thread-safe and prevent race conditions in concurrent environments.
Challenges
The original implementation of
brilirs
was understandably not designed withmultithreading in mind. As a result, we had to change some of the architecture
of the data structures and main function headers. To simplify the design,
State
no longer has a parameterized lifetime, and instead uses an atomicreference counter,
Arc<T>
, to hold the program and environment. Further, toimplement support for atomic ints, we needed a secondary state variable that
would all the instantiated atomics. We could not repurpose the heap for this, as
Value
needs to derive theCopy
trait, but this is not supported for RustAtomicI64
. To remedy this, we added an additional data structure, modeledafter
Heap
, to store all atomics.Evaluation
To evaluate our extension, we primarily focused on two key attributes:
correctness and runtime. To evaluate the runtime gains from asynchronous
programming, consider the following simple benchmark: we use a loop that just
adds 1 to a variable. On the first thread it performs 2000000 iterations, on
the second thread it performs 1000000, and so on and so forth. One good example
of when a benchmark like this would be used is in multiplying two large matrices
together. Below, we show the execution time in miliseconds against the number
of threads used in the task:
We observe that from 1 thread to 5 threads, we see substantial performance
improvements. This makes sense, as we are taking direct advantage of
parallelization in our programs. From 5 threads to 50 threads however, there
is actually a slight increase in the execution time. This also makes sense,
as increased number of threads requires more careful resource allocation by
the operating system. Since our implementation requires locks on the global
heap, more threads means more actors vying for the locked resources,
causing a general slowdown in overall performance.
To evaluate the correctness of our programs we noted that it is not particularly
useful to use pre-existing Bril benchmarks, as none of them are written with
asynchronous features in mind. We instead took a different approach. We can look
at asynchronous Bril programs that are implemented incorrectly and then
observe the behavior that is different from the expected behavior. In other
words, concurrent array accesses with variables that are not atomic will lead
to undefined behavior. Likewise, when atomics are used correctly the result
of the program is deterministic and predictable. This can easily be observed
by taking the Bril program above used to showcase atomics and removing the
atomicity of the integer (i.e. make it an
int
instead of anatomicint
).We quickly see why the atomics are necessary, and how it guarantees the
correctness of the program.
Overall evaluating asynchronous programming features can be challenging, but
we believe that through our thorough testing within the interpreter and our
Bril benchmarks that we have created a robust extension that can be used by
many future Bril users.
Conclusion
Concurrent programming is hard to understand, and it is important to begin
to understand how to implement concurrent programming features at the compilers
level. As we noted at the start, threads cannot be implemented as a library -
threading is inherently embedded into the semantics of the language. We gained
a lot of experience in this project with asynchronous programming and
concurrent programming features in Rust. Overall, we are excited to see this
extension be used in future iterations of CS 6120, and hope that more students
will build upon the work we've done to create many interesting optimizations
involving parallel programming in Bril.