This is a worked through solution to [D&D.Sci (Easy Mode): On The Construction Of Impossible Structures](https://www.lesswrong.com/posts/Syfq6MwgdZhHg9vha/d-and-d-sci-easy-mode-on-the-construction-of-impossible)

In [5]:
]settings workdir ,[DYALOG]/Library/Conga   ⍝ Make sure we can find HttpCommand

In [13]:
]box on -fns=on
]rows on -fold=3 -fns=on

In [7]:
]get ./D&D.Sci/data.csv

In [8]:
]get ./D&D.Sci/architect_proposals.txt

In [21]:
⍴data

In [22]:
5↑data

In [23]:
≢architect_proposals

In [32]:
800↑architect_proposals

By this, we can see we need a model that takes:

- background
- characteristics
- type
- materials

and predicts both the cost and whether the structure is impossible.

It looks like we ought to normalise the data according to each of our dimensions.

- bg: architect background
- typ: structure type
- mat: construction materials
- blu: characterisation of blueprints
- im: is structure impossible?
- cst: cost of structure

We can see that `mat` has overlap in the lists. Let's see whether our other fields are pure categories or if they have overlaps too.

In [26]:
10(⊂⍤?∘≢⌷⊢)data

It looks like they are all pure (can be encoded as indices into the unique items) except for materials which are two selected from those available.

What can we do to inspect these data? A hint is that anything more than spreadsheets is overkill...



In [29]:
d←1↓data

In [31]:
(↓⍉d[;⍳4]){⍺,≢⍵}⌸¨⊂d[;5]

In [9]:
(h d)←{(1⌷⍵)(1↓⍵)}data   ⍝ Split header and data

In [18]:
(⌈/,⌊/)d[;h⍳⊂'Cost of Structure']

The range is quite large, so we will use the logarithm of costs in our analysis.

In [10]:
log_cost←⍟d[;h⍳⊂'Cost of Structure']

Now we look at creating indicator variables for our categorical variables.

We create indicator variables for each of the possible materials.

In [17]:
unique_mat←∪⊃,/pairs←(⊂1 3)∘⌷¨' '(≠⊆⊢)¨d[;h⍳⊂'Required Construction Materials']
materials←unique_mat∊⍤1↑pairs
10↑materials

In [25]:
h

In [29]:
+/'Yes'∘≡¨d[;h⍳⊂'Is Structure Impossible?']                           ⍝ How many impossible structures in our data?
≢∪d[;1 2 3 4]⌿⍨'Yes'∘≡¨d[;h⍳⊂'Is Structure Impossible?']             ⍝ Are there particular combinations that lead to impossible structures?
≢∪(materials,d[;1 2 4])⌿⍨'Yes'∘≡¨d[;h⍳⊂'Is Structure Impossible?']   ⍝ What about if materials are considered separately?

Let's get some percentages of our categorical variables against whether they create impossible structures.

In [36]:
n←≢d
d[;h⍳⊂'Background of Architect'],∘(+⌿÷≢)⌸'Yes'∘≡¨d[;h⍳⊂'Is Structure Impossible?']

So our impossible structures were created by apprentices of P. Stamatin and B. Johnson, and some self-taught architects. Under what conditions do self-taught architects create impossible structures?

In [46]:
stm←{(⊂⍒⊢/⍵)⌷⍵}(materials,'Self-Taught'∘≡¨d[;h⍳⊂'Background of Architect']),∘(+⌿÷≢)⌸'Yes'∘≡¨d[;h⍳⊂'Is Structure Impossible?']
(unique_mat,'Self-Taught' 'pct impossible')⍪stm

It looks like self-taught architects who use dreams are highly likely to create impossible structures.

We suspect structure type and materials might contribute to cost. Let's investigate that relationship.

In [50]:
⍝ Mean cost by structure type
d[;h⍳⊂'Proposed Structure Type'],∘(+⌿÷≢)⌸d[;h⍳⊂'Cost of Structure']

⍝ Mean cost for each material (using your materials matrix)
(unique_mat,⊆'avg cost')⍪{(⊂⍒⊢/⍵)⌷⍵}materials,∘(+⌿÷≢)⌸d[;h⍳⊂'Cost of Structure']

In [62]:
{(⊂⍒⊢/⍵)⌷⍵}(↑pairs),∘(+⌿÷≢)⌸d[;h⍳⊂'Cost of Structure']

In [63]:
materials[;unique_mat⍳⊂'Nightmares'],∘(+⌿÷≢)⌸d[;h⍳⊂'Cost of Structure']

So we can see that nightmares are significantly more expensive than any other kind of material.

We can certainly attempt a regression on materials vs $\log(cost)$.

In [56]:
mat_cost←materials,∘(+⌿÷≢)⌸⍟d[;h⍳⊂'Cost of Structure']
⎕←coeff←(⊢/mat_cost)⌹¯1↓⍤1⊢mat_cost

In [58]:
*0 0 1 1 0 0+.×coeff
*0 1 0 0 1 0+.×coeff

So those most likely to create impossible structures are apprentices of P. Stamatin and B. Johnson, or self-taught architects who use dreams. We want to avoid proposals to build with nightmares because they are expensive.

In [65]:
300↑architect_proposals

The pattern is:
```
Architect N, who was [background], presents [blueprint] proposing to build a [type] of [materials]
```

In [77]:
ap←⎕CSV architect_proposals 'S' ⍬
ap[;2]↓⍨←8   ⍝ Drop "who was "
ap[;3]↓⍨←9   ⍝ Drop "presents "
n←≢'blueprints proposing to build a '
ap[;3]↓⍨←n+' blueprints'∘(⍸⍷)¨ap[;3]
⎕←proposals←ap[;1 2],↑(⊂1 3 5)∘⌷¨' '(≠⊆⊢)¨ap[;3]

In [80]:
imp←proposals[;2]∊'Apprenticed under P. Stamatin' 'Apprenticed under B. Johnson'
proposals[⍒imp;]

There are 5 proposals guaranteed to create impossible structures, but one intends to use nightmares. The four proposals to accept are D, E, H and K.