Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #1

Open
hpoit opened this issue May 12, 2016 · 3 comments
Open

Roadmap #1

hpoit opened this issue May 12, 2016 · 3 comments

Comments

@hpoit
Copy link
Owner

hpoit commented May 12, 2016

No description provided.

@hpoit
Copy link
Owner Author

hpoit commented Aug 29, 2016

Representations

  1. Symbols
  2. Consolidated MLN clauses
  3. Data schema

MLN tasks

  1. MRF partitioning, a technique that can result in dramatically improved result quality
  2. MAP inference, to find the most likely possible world
  3. Marginal inference, to estimate marginal probabilities
  4. Weight learning, to learn the weights of MLN rules given training data
  5. Rule learning, with reinforcement learning

Functionalities

  1. Prolog/Datalog: In addition to MLN rules, to also execute logical rules (embed engine through C)
  2. Functions: a library of common numeric/string/boolean functions, which can be
    used inside an MLN rule. In particular, to perform arithmetic manipulation and comparison
    in MLN rules.
  3. Predicate scoping: Sometimes even grounding the atoms of one predicate will blow up RAM. On the other hand, it’s often the case that you only care about a particular subset of the
    exhaustive set of ground atoms. This feature allows to explicitly specify the atoms you are
    interested in so that your program becomes runnable again.

Distribution

  1. Task decomposition, by separating complex tasks into subtasks with specializing
  2. Data partitioning, by automatic parallelization of complex statistical tasks

@hpoit
Copy link
Owner Author

hpoit commented Sep 5, 2016

Hi @tawheeler, @sbromberger. FYI, from a functional perspective, this is what I just sketched up

Tuffy Program
Goal: scale relational operations during grounding phase of MLN inference through RDBMS
Method: use hybrid solution of RDBMS-based grounding and in-memory search
Method: use partitioning to further improve space and time efficiency of MLN

General Functionalities
1a. Symbol table to convert all logic constants into integer IDs
1b. Consolidate MLN clauses of same pattern
1c. PostgreSQL to store input and intermediate data, e.g. ground Markov network object
2a. Efficient grounding of SQL queries (RDBMS-based) with KBMC and
2b. Lazy reference (in-memory search) for MLN formula grounding resulting in Markov random field
2c. Partitioning and inferring on MRF
3a. MAP inference with WalkSAT
3b. Marginal inference with MC-SAT
4. Discriminative weight learning with Diagonal Newton (Lowd and Domingos)
Result: scalability of MLN inference and of grounding phase

Felix Program
Goal: efficiently inference in Markov Logic through common subtasks in text-processing tasks.
Method: use specialized algorithms for each task

General Functionalities

  1. Algorithms CC, LR, Tuffy
  2. Compiler to find (or use?) algos automatically
  3. Data-movement optimizer built into an RDBMS
    Result: scale to complex information extraction programs on large datasets and generate results with higher quality than state-of-the-art IE approaches.

I am counting on you guys for a peer review from time to time as I move forward. Thank you.

@tawheeler
Copy link

Hi @hpoit. This is very MLN-specific so I can't really say whether it is a good approach or not.
I would recommend against implementing things like SQL backends before you have a very basic version of everything else working first.

I'd start with:

  • using LightGraphs.jl to represent a graph
  • have a vector of clauses, one for each node in the graph
  • have some initially simple representation for the clauses
  • make sure you can evaluate the MLN and do the basic stuff you need to do (basic inference, basic weight learning, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants