`What is max flow / minimum cut?`
Say you are an oil company and want to ship as much of your oil as possible.
There's no cost to shipping oil, so the idea is just to ship as much at once as you possibly can.
You live in a directed graph, with a source and a sink.
Each connection has a maximum amount that you can push through it.
You want to push as much oil as you can from the source to the sink.
You can think of it as 'oil pushed per unit time' or something to that effect.

`I don't get the graph thing.  If we choose e, why would we choose f?  How exactly are we choosing paths?`
You can go reverse to an edge if there is currently flow along that edge.

`But f seems to have 1 flow going the opposite direction, so it canceled e and did an additional flow, so it has net 2 flow.  But that's not possible.  What's going on?`
No, it's 1 unit of flow.  Think of e and f as being superimposed onto each other.  Then the middle 2 cancel out, and you get the diamond shape.

`How do you know when you're done with max flow?`
When you can't make any more paths.
A path exists if some pipes still need flow, or if you can do reverse flow on sime pipes.

`What stops the flow graphs from cancelling each other out, leading to an infinite loop?`
Because the flow graphs aren't actually cancelling out.  When we create e, then create f, e's thing doesn't get 'canceled out'.  They're just superimposed on top of each other.  So we know every way we've gone before.  No information is lost.

`Where does min cut come in?`
Cutting the graph means drawing a line through the graph so that the source is on one side and the sink is on the other.  Insert the example from 7.2.4 here.  In this example, we have found the minimum cut, since 7 units of flow pass over the cut.  We could cut it near the beginning for a cut of 10 flow units, since the other cut is 7 units, we know we won't be able to use all 10 units of this flow.  Think of min cut as kind of like a pinch in a garden hose.  A garden hose can only output as much water as its smallest section.  Well, not really, but you know what I mean.

Min cut is the dual of max flow.

`What is the actual algorithm?`

Do breadth first search until you get to the sink.  Create a residual graph thing.  Then in your original graph (or in a third, maybe?) you subtract all the flow that you used from the original graph.  Do this until failure.  Then you'll have a min cut.  The runtime is $O(V*E^2)$.

So you start a residual graph, which at the beginning tells you the max capacities.
Then you find a path through the graph.
Then you change your graph based on the path.
Do this until the graph is biparted and you can't do breadth first search anymore.

`What do you need to learn min cost flow?`
Minimum cost flow uses the network simplex algorithm
Network simplex algorithm is a special varient of the simplex algorithm
Simplex algorithm is used to solve linear programs.
So maybe max flow -> linear programming -> simplex -> network simplex -> minimum cost flow

`What is linear programming?`
You have a bunch of linear equations that are lines/planes/hyperplanes that all cut space in half.
Each equation says 'on one side of me is potential solutions, on the other side are not'
You need to find the solutions that all of the linear equations agree are correct.
Then out of those solutions, you have an objective (max or min) function that tells you the 'best' one.

`What is the simplex algorithm?`
Your linear equations will create a polygon in the solution space.
The optimal solution will be a point (corner) in this polygon, usually.
If it's not a corner, you either have no solutions or infinite solutions.
Anyway, you have this polygon, and simplex 'hill climbs' this polygon.
It starts at one edge of the polygon, finds a corner, 
and checks that corners value with the objective function.
If saves the best corner value it has seen so far.
It looks ahead to the next corner.
If the next corner is better, it moves to that corner.
If simplex looks ahead and sees a corner that is the same or worse (less than or equal to) the the corner it's on right now, simplex knows it has found the optimal solution.

`Why does that imply it's the optimal solution?  Why couldn't it be in a local maxima?`

Remember that this is linear programming.  Theres only lines, planes, and hyperplanes.  Try and make a local optima using just those.  Well, don't, actually, because you can't.  More formally, since this is _linear_ programming, and linear functions are convex, there's only every 1 optima: the global optima.

`How does simplex hill climb in 3d and beyond?`
It hill climbs pretty much the same, except now simplex is checking multiple neighbors and picking the best one.  So in 3d, simplex starts at some point, and checks the 3 adjacent points.  Then it goes to the best one.  Then at its new location it checks its 3 adjacent points.  Or maybe 2 considering it was just at 1 of those 3 neighbors.  Anyway, in 4d you would check 3 neighbors, in 5d you would check 4 neighbors, etc.

`Simplex can't 'see' the polygon its optimizing on.  How does simplex find 'corners' without using geometry?`
A corner, or 'vertex' is a sharp bend in the polygon.  The vertices are where at least 2 inequalities (lines/planes/hyperplanes) cross.  2 vertices are neighbors if they share all but 1 inequality.

I'm not really sure how it's done, and I don't really care anymore, because I just read that there's better methods than simplex now.  Maybe.  Ok, wait.  I just read that simplex is a 'pivot' method or something.  Like QuickSort.  Quicksort is $O(n^2)$ but it's faster than MergeSort in practice.  Everyone says that it's fast in practice.  So maybe stick with it for a while?  It's 'worst case' exponential time.  Hm...

Some vertices can be created in more than 1 way.  These vertices are 'degenerate', which is bad.  Simplex needs to do extra computations on these, or something.

`What is this thing:  A magic trick called duality?  What are they talking about?`
They say 'where did the 0, 1, 1, 4' thing come from.  I'm pretty sure that's just a bunch of linear algebra row reduction stuff to get the objective function.
So it proves the optimality of the solution.  Note that it doesn't give the solution.  We want a point that tells us what each of the x's are.  This only tells us the best we can do is 3100.  It doesn't tell us how to get to 3100.

Also note that with the chocolate example, you could get bounds that aren't as tight as possible.  It's under the duality section.

`Why is integer linear programming exponential time?`
Let's say you're the mayor of a town, and want to hire A police, B firefighters, and C teachers.
Obviously you can only hire an integer amount of each of these.
Now let's say you used a regular linear programming method and got a bunch of decimal values for each job.
You get A=10.6, B = 5.4, C = 12.5.
So you think to yourself: easy, I know i'll need to hire A=10-11, B=5-6, C=12-13.  I'll just check each of those combinations.
But that's basically checking every possible combination of a 3 digit binary string.
If you had a hundred jobs, you would have to check each combination for a 100 bit binary string.
That's $O(2^n)$ time.
Of course, there are plenty of heuristics you could use.
Stuff like 'I know if I hire 11 police, there won't be enough budget for 13 teachers, so all combos of 11 police and 13 teachers are off the table.'
But it's still exponential time.

`What is going on in the duality section?`
It's saying:
We have these linear inequalities.  
Some linear combination of them will be the same as our objective function.
If we can get this linear combo of the constraints, we can figure out the tightest possible bound on our objective function, and we'll know for sure whether we actually computed the maximum or not.
All the y's are the mysterious ratios that we need in order to make the combined left hand side of these inequalities look like our objective function.  
Then we'll have a bound on our objective function.
We want the smallest possible values for the y's so we can get the tightest bound possible.
So we want to minimize the y's.
Wow, that's a linear programming problem.
So this new linear programming problem that minimizes something and is kind of the opposite of our primal (meaning original) problem actually solves the same thing.
So if we solve this 'dual' problem for all the y's, and plugging those y's into our objective function thing results in the same thing as when we plugged in the x's, we know we got the right answer in the first place.

So for now, I would think of duality as a way to check your answer.  If you do your answer wrong the first time, and then you do the dual problem, you'll get different solutions.  Then you'll know you either did the primal or the dual problem incorrectly.  I guess it might be possible for you to do both problems incorrectly and end up at the same answer.  But since the problems are really mathematically different, you would have to mess up in 2 different ways and somehow end up at the same answer.  What are the odds of that?

Also people are always calling duality 'beautiful' so maybe there's some 'deeper' meaning here.  I mean, I think it's aesthetically cool.  But usually what people mean by 'beautiful' is that it still holds many secrets or some other such thing.

Might also want to read that 'visualizing duality' section of the book on page 210.

There's also the fact that we have a 'standard' LP problem, which is the whole x's greater than 0, Ax less than or equal to b, maximize this function, etc.  It's easier to code and reason about if we just get all LPs into this standard form before solving them.  This way there's a lot less variation in the requirements for solving them.

`Ok, but how do I actually do the algorithm?`

Let's look at that chocolate example from the 170 textbook:

$$
max\ x_1+6x_2+13x+3 \\
x_1 \leq 200 \\
x_2 \leq 300 \\
x_1+x_2+x_3 \leq 400 \\
x_2+3x_3 \leq 600 \\
x_1 \geq 0 \\
x_2 \geq 0 \\
x_3 \geq 0 \\
$$

It's actually pretty easy to put this linear program into vector form.  Here's the generic vector form of a linear program:

$$
max\ c^Tx \\
Ax \leq b \\
x \geq 0 \\
$$

`Why is x greater than 0 its own constraint?  Why can't we incorporate it into Ax less than b?`

Dunno.  I'll try and answer this later.  Let's keep going.

For our specific example, the corresponding values for regular and vector form are:

$$
c^T = 
\begin{bmatrix}
    1 & 6 & 13
\end{bmatrix}
\\
x = 
\begin{bmatrix}
    x_1 \\
    x_2 \\
    x_3
\end{bmatrix}
\\
A = 
\begin{bmatrix}
    1 & 0 & 0 \\
    0 & 1 & 0 \\
    1 & 1 & 1 \\
    0 & 1 & 3 \\
\end{bmatrix}
\\
b = 
\begin{bmatrix}
    200 \\
    300 \\
    400 \\
    600 \\
\end{bmatrix}
\\
0 = 
\begin{bmatrix}
    0 \\
    0 \\
    0 \\
\end{bmatrix}
$$

You start by choosing a point at which the n inequalities are 'tight'.  Tight means that you are at the absolute bound, and can go no further.  This lets us know that we are on the surface of the polygon.  So just choose such that each bound is tight.  Then you look at 1 equation, 'release' the bound on that equation, and ... go to the other side?

Ok, the whole 'why is x greater than 0 it's own thing?' question is answered because a generic lp program is easiest when you have a bunch of 0 bounds.  So we might not have a 'greater than 0' bound for each variable in x, but if we change the coordinate system so that it does have those bounds, then we're in good shape.  We do this because when we transform the coordinate system so that we have a tight bound at 0 for every $x_i$, that means that we know the origin is a vertex.  So the question would be 'where do we start?'  And the answer would be 'transform the coordinate system so we can start at the origin'.

Maybe put off the whole matrix form until later?

`How do we know what our vertices are?  How do we know where to start?`

We don't know what our vertices are.  What we'll do is transform the LP so that the origin is definitely a vertex.  Then we can start simplex at the origin.

Transform to get the origin as a vertex.
Figure out how to loosen constraints so we can get to a different vertex.
    Loosen constraint
    'Increase' variable until we run into a new constraint.  Now that constraint is 'tight'.
    How do we increase a variable?  Some arbitrary step size?
    No, it's actually just a calculation.  You have a bunch of x's that specify a point.
    You 'release' one of the x's, so now it's a free variable.  
    Given that you know all the other x's values, you just solve the other equations to see which one is now
    'tight', due to you loosening one.
Check whether that vertex is optimal.
But wait, degenerate points might make us return something suboptimal.
Here's the solution to that: add 0.00000000001 so that the points are different.
What if the solution has no bound?  Simplex can detect that.
What if there's no feasible solution?  Simplex can detect that.

`The book says 'if the x's are tight, it's definitely a vertex'.  How do we know that for sure?  Imagine a feasible region of a square sitting on the x axis.  Then the origin is the the middle.  It's not a vertex, yet the constraints are tight.  So isn't this untrue?`

It would be true that a square sitting on the x axis wouldn't have the origin as a vertex.  However, that square also wouldn't have any 'x geq 0' constraints.  Remember that a vertex is where at least 2 constraints meet.

`Wait.  If we know that a vertex is where at least 2 constraints meet, why can't we just calculate all the vertices?  Why do we have to do these transformations so that we're always at the origin?`

You need to go back and change what you said.  Simplex just calculates the next whatever vertex, and if it's better, it goes there.  It doesn't calculate all of its neighbors, necessarily.  It's a greedy algorithm.  It just calculates neighbors until it finds a better one and goes there.


`Notes from PhD Grind`

Page ??? (The entirety of year 1):  What you might end up like if you pursue a Ph.D

Page 22:  Send cold emails.

Whats the problem?
What’s my proposed solution?
What compelling experiments can I run to demonstrate the effectiveness of my solution?

Page 23:

Professors are motivated by having their names appear on published papers, and computer science conference papers usually need strong experiments to get accepted for publication. Thus, it’s crucial to think about experiment design at project inception time.

Page 30:

I cannot re-emphasize this point enough times: Properly calibrating your pitch to the academic sub-community you’re targeting is crucial for getting a paper accepted.

Page 35-36:

Make sure you and your advisor actually have similar goals in mind.

Page 37-38:

The horror.  Years and years for no results.  Also, remember that this is Stanford.  The place you want to go.  Also note that he’s making ‘incremental’ improvements to the already existing Klee stuff.  That’s basically what you want to do.  Just assemble a whole bunch of stuff that already exists.  You’re not really doing ground-breaking work, just implementing other people’s work.  Potentially with modifications, but still.  You can’t make a paper out of this.  Well, maybe you could, but it would be difficult.  And you would be on a timer.  And you would have to capitulate to the professor.  Ugh.  There’s no time for that.  You don’t need to publish a paper, you need to publish a language.  But wasn’t Julia someone’s research project?  Hm… doesn’t seem like it.  But you want control.  You need to be able to make all of the decisions, right now.  What you want to make is a product.  It’s not truly an exploration of new ideas, it’s a synthesis of ideas into something real.  And do those people go to grad school?  No, they just make it.  You can’t be wasting your time with some feet-on-desk professors half-assed ideas.

Page 43:
Marketing buzzwords.  Required to publish papers.

Page 47:
Online submission forms are black holes. Anything is better than blindly submitting online.

53:  Search the web for related work.  Very important.

58:  It turns out that ignoring your instructors and doing whatever you want is what makes you the most productive.  As you already knew.  “Those three years—my latter half of grad school—were the most creative and productive of my life thus far, in stark contrast to my meandering first half of grad school”

63:  If you are a computer science researcher, your work will never be directly used in practice.  You simply don’t have the time to get things up to the production-level standards required for people to actually want to use your thing.  “To convince someone to try IncPy, I would need to guarantee that it works better than regular Python in all possible scenarios.”  Well, that’s your language, right?  All you have to do is put colons in front of all of your variable assignments, then instead of doing python <myfile>, you do sanic <myfile>.  Of course, gross oversimplification, but still.

65:  Dependency conflicts are annoying.  The more integrated your stuff is with code that already exists, the more screwed you are.  What kinds of things will you have to depend on?

69:  Your scholarships are on a timeline so you’re on your own if you don’t publish a bunch of papers right away.  So they make you waste your time with useless classes and bad research, then tell you that you can’t graduate unless you shit out a bunch of papers.

70:  This incPy thing turned out to be one of the components of a jupyter notebook, in a way.  You run some code, you see the output, you can refer to that same code later.  This makes me think that your language needs something like that.  That might be difficult considering your language is compiled.  Or would it be difficult?  It looks like jupyter works with Go, so it shouldn’t be too bad.

73:  Jupyter isn’t just really convenient, in a lot of ways it’s essential.  You need to be able to reproduce your research, not just present it.  How can you run programs really really fast when you’re in a Jupyter notebook?

74:  ‘Why not make it a general application?’  Because the larger your audience is, the less you will be able to deliver to them.

74-75:  ‘Why hadn’t anyone thought of this yet?’  He says he googles it and doesn’t get much back, but this seems like Containers to me.  It’s just containers.  

76:  “I’ll always miss those purer times. In my current job, there’s no way I can block off three weeks just to code non-stop!”  HMMMMMM….  Maybe get a different job?  Well, he’s probably happy where he is, but you wouldn’t be.
77:  “This is a very important point. It’s not our job as academics to ship polished products; that’s the role of companies.”  You want to ship a polished product.

80-81:  He got an offer to just work on his open source project at Google.  Just get paid money to do what he had been previously doing.

87:  This isn’t the first place this is mentioned, but “And without motivated students to grind through the tough manual labor, it’s impossible to get respectable publications.”  Your thing will consist of a lot of tough manual labor.  

88:  Creative freedom is not to be found anywhere.  Freedom to simply work on what interests you is also not to be found anywhere.

90:  Living arrangement matters for your productivity.

92-93:  An example of the constant state you want to achieve.  Also note that this guy made that one environment diagram program that is useful.  Getting used to new things is a difficult, frustrating process.  You have to push through that to get to the good part.  Of particular importance:  “After years of grinding on uncertain and failed projects earlier in grad school, I now felt invigorated working intensely towards a target that I knew I could feasibly achieve.”  You’re motivated by this compiler thing because it seems so obvious, so simply achievable to you.  Maybe not an easy task, but definitely a task that is possible.

100:  Produce results.  Not just for other, but for yourself, in order to stay motivated.  Also read the rest of the lessons, they seem very insightful.  Especially point 20.  Sad to say that effort alone isn’t enough.  You need to apply the right kind of effort.

Last Page:  He says that it was fulfilling, etc, etc.  But he makes a compelling argument for you not to pursue a graduate degree.  Working towards a graduate degree will force you to work hard.  Do you need someone to force you to work hard?  No.


`Dynamic programming`

Supposedly from the book, every dynamic programming problem is actually finding the shortest path in a dag, if it's a minimization problem, and the longest path in a dag if it's a maximization problem.  Remember in dags these problems are basically the same.

Imagine an array putting the nodes in linearized order.
Now, there's also a bunch of pointers from earlier nodes to later nodes.
Remember that in a dag there's only 1 source and 1 sink.

What is a DAG?
A directed acyclic graph.  Has 1 source, and 1 sink.  Since it's acyclic, any node you visit, you know you will never come back to that node.  Any path in the DAG uses each node either 0 or 1 times.  That makes it easy to calculate the longest path.  The shortest path is also easy (even easier?) than normal.

The thing in the book seems to calculate all the possible paths.  Wouldn't this be exponential time, and therefore slower than BFS?

Or maybe this DAG includes possible negative edge lengths?

Damn, you really need to relearn path algorithms.

Why is longest path otherwise hard to compute?
Well, that really depends.  If there's a positive cycle, there is no longest path, since you can just go around and around forever.  Or maybe the 'longest path' problem refers to the longest path between s and t without a cycle?

What are graphs?
Vertices with edges connecting them.
Why are they so pervasive?
Because a node represents a thing, and an edge represents some relationship between 2 things.
The easiest example is a map.
But there's other stuff too.
Like the student exams example.
Do the vertices represent students?  Tests?  timeslots?
How do you choose what the vertices and edges represent?
Sometimes edges are the ability to do things.
Sometimes its the opposite.  Edges are constraints.