2013 Strange Loop Unsession

Jason Wolfe edited this page Jan 13, 2016 · 2 revisions
(ns strangeloop-schema-unsession
  "An informal introduction to prismatic/schema
   [schema.core :as s]))

;;; Thanks for coming!

;; I'm Jason Wolfe.

;; I work at a small company called Prismatic.

;; We make real-time, personally ranked newsfeeds.

;; See my teammate Jenny Finkel's keynote first thing Wednesday for details.

;;; Clojure @ Prismatic

;; We're a 99% Clojure(Script) shop.

;; A handful of open-source projects
;; - Plumbing/Graph, which I talked about at Strange Loop 2012
;; - Dommy for ClojureScript dom manipulation
;; - Hiphip (array!) for arrays
;; - And now Schema

;; Beyond public GitHub, about 100klocc supporting tens of services
;; - web and social network crawlers
;; - document analysis
;; - distributed indices
;; - api for real-time newsfeeds
;; - web server with client-side cljs

;; Possible with 6 engineers partially because of Clojure, and in particular:
;; - (almost) everything can be manipulated as data
;; - ubiquitous sequence and map abstractions

;;; With great power ...

;; One of the biggest difficulties we've encountered with our growing Clojure
;; codebase and team is the overhead of understanding the kind of data
;; being passed around and manipulated.

(defn update-share-counts [share-counts updates]
   (fn [result {:keys [user-id share-type delta]}]
      [share-type (str user-id)]
      (fnil + 0)

;; What is a share-counts?  How about updates?

;; Maybe a docstring will help?

(defn update-share-counts
  "Increment share-counts according to the share actions in updates:
    share-counts: map from user-id to map of share-type
    (must be one of :twitter, :facebook, or :email) to
    count of number of shares (a long)
    updates: sequence of maps {:user-id :share-type :delta}, where delta is
    amount to increment share-type by
  returns updated share-counts reflecting shares in updates"
  [share-counts updates]
  ;; etc

;; This is undoubtedly an improvement.  But:
;; - not machine readable
;;   - can't help us find bugs
;;   - prone to bit-rot
;; - not (very) human readable either...
;;   - many ways to phrase this content
;;   - hard to adopt uniform standards and get good at easily reading such
;;     descriptions
;;   - no abstraction -- if data types are reused, becomes very repetitive

;; Enter Schema

(def ShareType (s/enum :twitter :facebook :email))
(def ShareCounts {(s/named Long 'user-id) {ShareType (s/named Long 'count)}})

(s/defn update-share-counts :- ShareCounts
  [share-counts :- ShareCounts
   updates      :- [{:user-id Long :share-type ShareType :delta Long}]]
  ;; etc

;; By default, functionally identical to the normal defn ...
;; but with crisp documentation that is:
;; - easy to read (IMO)
;; - precise
;; - reusable
;; - composable

;; ...

;;; Moreover, Schemas are composable data, and print like their definitions:

;; ==> {java.lang.Long {(enum :facebook :email :twitter) java.lang.Long}}

;;; Schemas can be used for validation, and provide nice error messages

;; s/check returns nil for success, or something that looks like the
;; bad parts of your data

(s/check ShareCounts
         {12 {:twitter 10 :facebook 15}})
;; ==> nil

(s/check ShareCounts
         {12 {:twitter 10}
          13 {:twitter 1.4}
          "fred" {:facebook 10}})
;; ==> {13 {:twitter (not (instance? java.lang.Long 1.4))}
;;      (not (instance? java.lang.Long "fred")) invalid-key}

;; s/validate is like (assert (not (s/check ...)))

  (s/validate ShareCounts
              {12 {:twitter 10}
               13 {:twitter 1.4}
               "fred" {:facebook 10}})
  (catch Exception e e))
;; ==> #<RuntimeException: Value does not match schema:
;;      {(not (instance? java.lang.Long "fred")) invalid-key
;;       13 {:twitter (not (instance? java.lang.Long 1.4))}}>

;;; And if you attach schemas to your functions with s/defn or s/fn, you can
;; optionally turn on fn schema validation at runtime (e.g., in tests):

  (s/with-fn-validation (update-share-counts
                         {12 {:fakeblock 10}}
                         [{:user-id 10 :share-type :twitter :delta 12.0}]))
  (catch Exception e e))
;; ==> #<RuntimeException: Input to update-share-counts does not match schema:
;;     [(named {12 {(not (#{:facebook :email :twitter} :fakeblock)) invalid-key}} share-counts)
;;      (named [{:delta (not (instance? java.lang.Long 12.0))}] updates)]>

;;; Outline

(def ThisUnsession
  {0 (s/both Short (s/eq :tour))
   1 (s/enum :why? :how?)
   2 (s/pred next)
   3 (s/named s/Any 'discussion)
   double [(s/either (s/eq :comments) (s/eq :questions))]})

;;; Meet Schema

;;; Schema is a lightweight Clojure(Script) library for declarative data shape
;; description and validation.

;; We love the simplicity, flexibility, and pithiness of Clojure functions.

;; But as our codebase and team grew, we increasingly found the need for equally
;; crisp declarations of key data shapes, for readability and maintainability.

;;; Start simple: all type hints are valid Schemas:

(s/check String "good")
;; ==> nil

(s/check String :bad)
;; ==> (not (instance? java.lang.String :bad))

(s/check long 12)
;; ==> nil

;;; A few fancier 'leaf' schemas:

(s/check s/Any :whatever)
;; ==> nil

(s/check (s/enum :a :b :c) :a)
;; ==> nil

(s/check (s/enum :a :b :c) :Z)
;; ==> (not (#{:a :c :b} :Z))

(s/check (s/pred odd?) 3)
;; ==> nil

;;; Higher-order schemas

(s/check (s/maybe String) "asdf")
(s/check (s/maybe String) nil)
;; ==> nil

(s/check (s/either String long) "a")
(s/check (s/either String long) 1)
;; ==> nil

(s/check (s/both long (s/pred odd?)) 11)
;; ==> nil

(s/check (s/both long (s/pred odd?)) 12)
;; ==> (not (#<core$odd_QMARK_ clojure.core$odd_QMARK_@4c6ed9d1> 12))

(s/check (s/named long 'UserID) "bob")
;; ==> (named (not (instance? java.lang.Long "bob")) UserID)

;;;;; Data Structures

;;; Sequences

;; [schema] is a uniform sequence

(s/check [String] ["abc" "foo" "123"])
;; ==> nil

(s/check [String] ["abc" :foo "123"])
;; ==> [nil (not (instance? java.lang.String :foo)) nil]

;; positional constraints can be expressed with s/one (and s/optional):

(def ScoredLabel
  [(s/one String "label") (s/one Double "score")])

(s/check ScoredLabel ["a" 1.0])
;; ==> nil

(s/check ScoredLabel [:foo])
;; ==> [(named (not (instance? java.lang.String :foo)) "label")
;;      (not (present? "score"))]

;;; Maps

;; {key-schema val-schema} is a uniform map

(s/check {Long [String]} {1 ["a" "b"] 2 []})
;; ==> nil

;; specific key constraints can be expressed with s/required-key and
;; s/optional-key.

(def ScoredLabelMap
  {(s/required-key :label) String
   (s/optional-key :score) Double})

(s/check ScoredLabelMap {:label "a"})
(s/check ScoredLabelMap {:label "a" :score 1.0})
;; ==> nil

(s/check ScoredLabelMap {:label "a" :another-key :another-val})
;; ==> {:another-key disallowed-key}

;; the two can also be combined, and
;; s/required-key can be omitted for keywords

(s/check {:foo String Long Long} {:foo "a" 1 2 3 4})
;; ==> nil

;;; Complex data shapes can be built up from components, and checked
;; with helpful error messages

(def StampedNames
  {:date Long
   :names [String]})

(def ScoredLabel
  [(s/one String "label") (s/one Double "score")])

(defn OddNumber [number-type] ;; like generics, kinda
  (s/both number-type (s/pred odd?)))

(def FooBar
  {:stamped-strings (s/maybe StampedNames)
   (s/optional-key :scored-labels) [ScoredLabel]
   String (OddNumber Long)})

(s/check FooBar
         {:stamped-strings nil})
;; ==> nil

(s/check FooBar
         {:stamped-strings {:date 123 :names ["foo" "bar"]}
          :scored-labels [["label1" 1.0] ["label2" 2.0]]
          "another-key" 11})
;; ==> nil

(s/check FooBar
         {:stamped-strings {:date 123 :names ["foo" :bar]}
          :scored-labels [["label1" 1.0] ["label2"]]
          "another-key" 12})
;; ==> {:stamped-strings {:names [nil (not (instance? java.lang.String :bar))]}
;;      :scored-labels [nil [nil (not (present? "score"))]]
;;      "another-key" (not (odd? 12))}

;;; Schema also provides a natural way to attach and validate schemas on
;; defrecord fields:

(s/defrecord RStampedNames
    [date :- s/Int
     names :- [s/String]])

(s/check RStampedNames
         (->RStampedNames 10 ["a" :b]))
;; ==> {:names [nil (not (instance? java.lang.String :b))]}

;;; As well as a way to provide schemas for function inputs and outputs

(s/defn stamped-names :- RStampedNames
  [names :- [s/String]]
  (->RStampedNames (System/currentTimeMillis) names))

(s/explain (s/fn-schema stamped-names))
;; ==> (=> (record strangeloop_schema_unsession.RStampedNames
;;                 {:date Int, :names [java.lang.String]})
;;         [java.lang.String])

(stamped-names ["a" :b])
;; ==> {:date 1379537133893, :names ["a" :b]}

    (stamped-names ["a" :b]))
  ;; java.lang.RuntimeException: Input to stamped-names does not match schema:
  ;;  [(named [nil (not (instance? java.lang.String :b))] names)]

;;; A powerful subset of Schemas (json+) can be shared between Clojure
;; and ClojureScript
;; - enables schema sharing of API inputs and outputs
;; - less code duplication between client and server, better team communication

(def StampedNames
  {:date s/Int
   :names [s/String]})

;;; Design Goals

;;; #1: Schemas should be simple data.

;; Schemas are composable

;; Schemas are inspectible and transformable
;; - and so are validation errors

;; Validation is based on a simple, open protocol, so you can make your
;; own schema types and combinations

(extend-protocol s/Schema
  ;; #+cljs cljs.core.PersistentHashSet #+clj ;; if this were a cljx file...
  (check [this x]
    (or (when-not (set? x)
          (schema.macros/validation-error this x (list 'set? (schema.utils/value-name x))))
        (when-let [out (seq (keep #(s/check (first this) %) x))]
          (set out))))
  (explain [this] (set [(s/explain (first this))])))

;; Only ~600 LOC for Clojure(Script) schemas
;; (plus a few hundred more for s/defrecord and s/defn)

;;; #2: Compared to a docstring or comment with the same information,
;; a schema should always be less hassle to write, and easier to read.

;; Schemas should gracefully extend built-in type hints
;; - provided next to arguments and return types, on defrecords and defn
;; - type hints are valid schemas, and simple schemas act as type hints
;; - more complex schemas can be defined inline

;; Schemas should be optional, and never constrain what you can write

;; Schemas should be able to express arbitrary constraints

;; Validation should be off by default, so you don't have to think twice
;; about performance

;;; #3: Schemas should be incrementally useful.

;; First, define schemas for key data types in your namespace
;; - documentation
;; - manual data validation entering/exiting a system, with nice error msgs

;; Next, annotate key public functions in your namespace with schemas
;; - better documentation
;; - turn on validation at test-time to catch 'type-like' bugs
;;   (or run-time, if you like)

;; You could go on to annotate everything...
;; - but if you want a full-blown type system, you want core.typed
;; - core.typed can do static validation based on consistency of annotations
;;   throughout your program, with no runtime cost
;; - Schema is designed to make the most of a few annotations in critical places
;; - But there may be interesting synergies...

;;; What's next?

;;; Generate core.typed annotations on functions?

;;; Generate test data from schemas?

;;; Generate model classes for clients and even full client APIs (WIP!)

(def Interest {:type (s/enum "topic" "publisher" "user") :key s/String})

(require '[plumbing.core :as plumbing])

(plumbing/defnk interests$update
  {:methods      [:post]
   :query-params {}
   :body         {:updates [{:op (s/enum "add" "remove")
                             :interest Interest}]}
   :description  "Update the interests of the current user"
   :returns      {:interests [Interest]}}
  [env user-store [:request user [:body updates]]]
  ;; ...

;; On the server, we can automatically transform namespaces of such functions
;; into a versioned web API, with automatic schema validation of inputs and
;; outputs (and automatic plumbing of resources, using Graph)

;; In ClojureScript, we can directly use these same schemas to test our code and
;; validate our requests and responses.

;; On other clients, when we're forced to leave Clojure behind, we can
;; automatically generate model classes that know how to populate themselves
;; from server responses

;;  (Interest.h, Interest.m, InterestUpdateResponse.m, ...)

;; and even a full API class that handles the plumbing to the server.

;;; Your questions and feedback

;; TODO(audience): fill this in