Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add Windows support #235

Merged
merged 37 commits into from Aug 25, 2018
Merged

Conversation

alessandrogario
Copy link
Contributor

@alessandrogario alessandrogario commented Mar 7, 2018

This is an overhaul of the CMake code, based on what I wrote for mcsema1 (but never merged).

New features:

  1. Windows (Visual Studio 2017 and 2015) and macOS support
  2. Adds support for the install target
  3. Adds a CMake config file in the library folder that can be automatically used by other projects by calling find_package(remill)
  4. Detects changes to the semantic files and automatically recompiles the runtimes!

Note that his PR requires my other alessandro/feature/windows-support branch on the mcsema repository!

@alessandrogario alessandrogario force-pushed the alessandro/feature/windows-support branch 4 times, most recently from 77c3f87 to 04bc009 Compare March 11, 2018 16:15
@alessandrogario
Copy link
Contributor Author

This branch is now passing all tests locally; Travis reports an error on the same FPU instruction that fails on master

@alessandrogario alessandrogario changed the title WIP: Add Windows support Add Windows support Mar 11, 2018
@alessandrogario alessandrogario force-pushed the alessandro/feature/windows-support branch from 04bc009 to bafd566 Compare March 29, 2018 22:30
@alessandrogario alessandrogario changed the title Add Windows support WIP: Add Windows support Mar 29, 2018
@alessandrogario alessandrogario force-pushed the alessandro/feature/windows-support branch from 48e2666 to 89802d9 Compare March 31, 2018 15:35
Copy link
Collaborator

@pgoodman pgoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work as usual. Left some questions/comments. :-)

set(REMILL_LIBRARY_LOCATION "${CMAKE_INSTALL_PREFIX}/remill/lib/remill.lib")
set(REMILL_INCLUDE_LOCATION "${CMAKE_INSTALL_PREFIX}/remill/include")
else()
set(REMILL_LIBRARY_LOCATION "${CMAKE_INSTALL_PREFIX}/lib/libremill.a")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: does this deal with versioning? That is, can I have two Remills installed, one for LLVM 3.x, and another for 4.x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great to implement! Right now you can only expect to install one version system-wide (in my opinion it should match the LLVM version available in the package repository).

You can still install more than one version using prefixes, and then essentially use the same method we use to find_package LLVM.

I am adding this in the todo list though, so that we can implement this later!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for us to prepare for this now, e.g. by making libremill.a a symlink to libremill-M.m.a?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe having an install directory version number would make sense, with symlinks? You are ultimately the best person to decide on how this should all work. In theory, one might want to eventually have a find_package for CMake allow the Remill version to be specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to think about this, as we also have other things to keep in mind (like the semantics path or the compatibility with mcsema). Maybe I can do some tests and post the results!


// Windows doesn't have the following macros defined
#ifndef _SW_INEXACT
#define _SW_INEXACT 0x00000001 // Inexact (precision)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to condition these on __x86_64__, not sure yet.

@@ -16,9 +16,11 @@

#include <algorithm>
#include <bitset>
#include <cmath>

#include "remill/Arch/FPUFlags.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we rename FPUFlags.h to something like Float.h, and have it #include cfenv and cfloat at the end of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this, but I think I have to keep the includes at the top because I am testing whether the defs are missing or not before actually defining them

namespace {
extern "C" std::uint32_t GetProcessId(std::uint32_t handle);

std::uint32_t getpid() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getpid(void).

#include <vector>
#include <fstream>

#ifndef WIN32
#include <dirent.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent the #includes into # include within the ifndef block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you've done all the cross-platform work to make this file work. I kind of wonder if we should just include some LLVM headers and use what's available in llvm::sys::fs and llvm::sys::path to implement the functions in this file in a cross-platform way.

@@ -493,6 +493,12 @@ static void ResetFlags(void) {
// but the logic in older versions (like eglibc 2.19, used on some Ubuntu 14.04 installations)
// does not clear exception flags and also does *not* ignore denormal exceptions
// see: https://sourceware.org/ml/libc-alpha/2015-10/msg01020.html

// Define the flag for macOS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this into the proposed remill/Arch/Float.h? E.g. re-defining FE_ALL_EXCEPT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Will do! I was still experimenting (especially with Travis) on OSX

@alessandrogario alessandrogario force-pushed the alessandro/feature/windows-support branch 7 times, most recently from 10bdb6f to 0d4f9c7 Compare April 1, 2018 18:14
@alessandrogario alessandrogario force-pushed the alessandro/feature/windows-support branch from fed6119 to bd5238b Compare May 3, 2018 16:18
@alessandrogario alessandrogario force-pushed the alessandro/feature/windows-support branch from 96e43f1 to 6ede014 Compare May 7, 2018 18:00
pgoodman and others added 11 commits June 26, 2018 12:42
* Add initial prototying of DeadStoreEliminator

* Improve upon DeadStoreEliminator prototype

Update `DeadStoreEliminator` prototyping with additional comments and
proper function prototype return values.
Fix a small warning in `Lifter.h` by adding `llvm::Value` and
`llvm::BasicBlock`.

* Add initial code for DeadStoreEliminator

Provide namespaced `remill::StateSlots` function which visits fields of
the module's state struct and returns `StateSlot` objects.
Remove `comment` field as the `State` object (an `llvm::StructType`)
does not track names in a useful way.

* Update state analyzer and fix type errors

Update `DeadStoreEliminator.cpp` to properly produce a vector of
`StateSlot`s without type errors (i.e. it compiles).
Code still needed for other container types besides structs.

* Add remill::VisitSequential for arrays and vectors

Add state analyzer code for LLVM's `ArrayType` and `VectorType`.
Add debugging use of `llvm::Type::dump()`.
Consider refactoring as a `StateVisitor` class.

* Begin refactor of Visit funcs to StateVisitor

* Fix segfault in StateVisitor::visit, code style

* Prototype ForwardAliasVisitor for alias analysis

Add code for ForwardAliasVisitor subclass of `llvm::InstVisitor`,
to be used for performing alias analysis of lifted functions.

* Add non-working visit functions for alias analysis

* Fix compilation of alias analysis visit funcs

* Change ForwardAliasVisitor to RetTy=bool

Update ForwardAliasVisitor to use booleans for return types to track
when we should add the instruction to the next_wl.
Add code to simplify StateSlots for vecs of ints.

* Add progress tracking to AliasAnalysis

* Correct progress tracker, use BasicBlockFunction()

* Change ForwardAliasAnalysis<RetTy = AliasResult>

Get some of those sweet sweet enums in there!

* Add FAV state pointer field, PHINode impl

* Allow non-const add and sub in FAV

Allow add and sub instructions with two pointers in the offset map.
Add to implementation of visitPHINode.

* Update PHINode impl in FAV

* Clean up use of StateSlots to create AAMDNodes

Add code to create AAMDNodes from StateSlot elements.
Change creation of StateSlot vector to have elements for every
byte offset of the state structure for fast indexing.

* Complete addition of AAMDNodes for load and store

Move function defs up for AAMDNode ops.
Add AliasMap typedef.
Finish generateAAMDNodesFromSlots.

* Add GenerateLiveSet func

* Initialize live set, add AAMDNodes to stores

* Change LiveSet creation to a block visitor class

* Add to_remove set to LSBV

* Update build script to use os-release for OS detection

* Update DSE code to conform to pag's review

* Refactor LiveSetBlockVisitor to one LiveSet per block

* Update VisitBlock to better check instruction type

* Fix VisitBlock CallInst and InvokeInst cases

* Stack allocate AAMDInfo in AnalyzeAliases

* Add remove pass option for VisitBlock

* Add DOT digraph generation

* Fix bugs in dot digraph

* Fix various bugs in AAMDNodes and LSBV

Move AAMDNodes functions later to match their use.
Clean up code conventions.
Fix small bugs in various spots in the code.

* Fix various function prototypes, overflow checks

* Merge AliasMap and OffsetMap

* Change add/sub insts to be safer

Add OpType enum class to replace use of `plus` bool.
Add more straightforward bounds checking on AddInst or SubInst values.

* Refactor GetUnsignedOffset style

* Fix illegal instruction error

* Add offset checking for GEP, provide log messages

Write a log message for the cases where `GetUnsignedOffset` returns
false (indicating an overflow or underflow).

* Fix APInt initialization in VisitGEP

* Re-add dead store elimination and alias map

* Move LSBV into alias analysis, expand callinst

Expand definition of cases where a CallInst should be considered to
touch the state struct or otherwise revive a slot.

* Improve DOT creation, fix errors for callinsts

Fix small errors in LiveSetBlockVisitor::CallAccessesState.
Clean up DOT digraph generation further.

* Clean up code per @pag's comments

* Clean up code, add selectinst, fix compile errors

Deal with a few small corner cases with FAV Load and Store instructions,
add SelectInst visitor.
Inline MarkLiveArgs to avoid the complications of C++ generics.

* Clean up logging

* More changes to please Peter

Clean up select and PHI node cases to improve circular dependency
handling.

* Correct errors in visitSelect

* Add load forwarding code

* Add code to run FBV

* Fix map usage error in FBV

* Add call, invoke cases for FBV

* Here are the changes whoops

* Minor API change

* Minor tweaks to DOT digraph printing, as well as attempts to handle the case where the forward analysis pass is incomplete. Still don't have it guarantee completion on everything, but seems 'good enough' for now.

* Change log level

* Fix compile errors due to messy merge

* Pag dead store (#262)

* Fix to script calling wrong function.

* Makes sure that value names are preserved (#249)

* Adds LLVM_VERSION() to accomodate llvm::LLVMContext::setDiscardValueNames() in <3.9 (#250)

* Makes sure that value names are preserved

* Adds LLVM_VERSION() macros to accomodate

* Update README.md

* Update README.md

* set state and memory as noalias (#254)

* Implements some ring 0 instructions in terms of hyper calls and new I/O port intrinsics. (#252)

* Random tests.

* More decode error info

* more playing around

* more system instructions. instrinsics for accessing I/O ports. Split our writes to individual control regs for better identification via hyper calls.

* CR8 read/write support (#255)

* Check argument index of function (#256)

* Fix NoAlias Attributes for older LLVM versions

* Fix for LLVM 4.0 and 3.9

* Fix typo

* Follow remill coding style

* Here are the changes whoops

* Minor API change

* Minor tweaks to DOT digraph printing, as well as attempts to handle the case where the forward analysis pass is incomplete. Still don't have it guarantee completion on everything, but seems 'good enough' for now.

* Change log level

* Fix compile errors due to messy merge

* Improve call/invoke case of FBV

* Improve call/invoke case of FBV

* Removes a level of indirection in the __remill_basic_block function

* Create dedicated stats tracker

Plus clean up the FBV visitor a teensy bit more.

* Move call/invoke LiveSet gen to static func

This commit is to pave the way for future improvements to where this
information is calculated (in FAV instead of LSBV).

* Add code to FAV for calls and invokes

Move call and invoke arg-based livesets to FAV to allow for module-level
LSBV code.

* Begin change of LSBV to module-level

* Minor bug fixes

* Remove some dead code

* Begin adding more code for call/invoke LSBV

* Add entry block checks for LSBV call/invoke

* Info about register names, as well as printing them in the digraphs.

* Make DOT printing only happen per function, as opposed to printing every function per DOT file. Minor tweaks to interprocedural analysis.

* Bug fixes and DOT printing improvements.

* Bug fixes and DOT printing improvements.

* Also print out DOT digraphs of functions after removing stuff

* Tried to make it treat everything before a call to __remill_error as dead but that didn't work out.

* Use datalayout and type sizes to handle the number of elements in a sequential type for LLVM 3.8 compatibility, also check for pointer type.

* More stats, hopefully fixes a bitcast issue.

* Minor bug fixes, and more DOT printing to help diagnose when the offset analysis terminates but there is still stuff in the work list.

* Some possible bug fixes

* Minor bug fix
@dureuill
Copy link
Contributor

Hello,
We've been trying to use this branch to evaluate if this fixes #265.

Unfortunately, we ran into some issues:

  • CMAKE_ASM_COMPILER and friends have a mangled value
    ** (see inline comment for details)

  • "Wrong" version of LLVM is selected

I have several versions of LLVM on my computer. I have a system-installed LLVM 3.5, plus a LLVM 5.0.2 in a custom directory. I build remill with the the LLVM_INSTALL_PREFIX set to the specific path that contains the LLVM 5.0.2 installation.

On master, the detected remill version is 5.0.2, which is what I want (since I specified this in the LLVM_INSTALL_PREFIX). However, on this branch, the detected version is my system version (3.5).

This is critical because my code using remill relies on LLVM 5.

  • Compilation fails with the same message as in the travis builds
/usr/include/x86_64-linux-gnu/bits/fenv.h:29:5: error: expected identifier
    __FE_DENORM = 0x02

remil/remill/Arch/Float.h:22:22: note expanded from macro '__FE_DENOM'
# define __FE_DENORM 0x02

We couldn't test if this branch fixes the issue with remill lib installation, since we couldn't actually reach the make install step. However, reading the code, we believe that it would be the case. Thank you for this PR.

if(NOT DEFINED CMAKE_ASM_COMPILER)
if(DEFINED LLVM_INSTALL_PREFIX)
set(CMAKE_ASM_COMPILER "${LLVM_INSTALL_PREFIX}/bin/clang++${executable_extension}"
CACHE PATH "Path to assembler (aka clang) binary." PARENT_SCOPE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With both PARENT_SCOPE and CACHE, this generates a mangled value on my system:
"/home/dureuill/build/llvm/llvm/bin/clang++;CACHE;PATH;"Path to assembler (aka clang) binary."

I believe this is because in CMake, a set value is either a local variable (and you can use PARENT_SCOPE), or a cache variable (and you cannot use PARENT_SCOPE) (see cmake documentation)

Since CMAKE_ASM_COMPILER is a cache variable, you should keep the CACHE, and leave out PARENT_SCOPE.
This fixes the problem on my machine.

This also applies to other CACHE variables in this file (CMAKE_CXX_COMPILER and CMAKE_C_COMPILER).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this change.

* Make DSE sensitive to indirect function calls

* FE_DENORM issue

* Added some compat code that implements the futimens syscall on mac os 10.12 for travis support.
* Make DSE sensitive to indirect function calls

* FE_DENORM issue

* Added some compat code that implements the futimens syscall on mac os 10.12 for travis support.

* Playing with CACHE and PARENT_SCOPE

* Minor stack address size check
Adds in 32-bit libraries for Linux builds.
@pgoodman pgoodman merged commit 85dc128 into master Aug 25, 2018
@pgoodman pgoodman deleted the alessandro/feature/windows-support branch August 25, 2018 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants