2013 Strange Loop Unsession

Jason Wolfe edited this page Jan 13, 2016 · 2 revisions
(ns strangeloop-schema-unsession
  "An informal introduction to prismatic/schema
   [schema.core :as s]))

;;; Thanks for coming!

;; I'm Jason Wolfe.

;; I work at a small company called Prismatic.

;; We make real-time, personally ranked newsfeeds.

;; See my teammate Jenny Finkel's keynote first thing Wednesday for details.

;;; Clojure @ Prismatic

;; We're a 99% Clojure(Script) shop.

;; A handful of open-source projects
;; - Plumbing/Graph, which I talked about at Strange Loop 2012
;; - Dommy for ClojureScript dom manipulation
;; - Hiphip (array!) for arrays
;; - And now Schema

;; Beyond public GitHub, about 100klocc supporting tens of services
;; - web and social network crawlers
;; - document analysis
;; - distributed indices
;; - api for real-time newsfeeds
;; - web server with client-side cljs

;; Possible with 6 engineers partially because of Clojure, and in particular:
;; - (almost) everything can be manipulated as data
;; - ubiquitous sequence and map abstractions

;;; With great power ...

;; One of the biggest difficulties we've encountered with our growing Clojure
;; codebase and team is the overhead of understanding the kind of data
;; being passed around and manipulated.

(defn update-share-counts [share-counts updates]
   (fn [result {:keys [user-id share-type delta]}]
      [share-type (str user-id)]
      (fnil + 0)

;; What is a share-counts?  How about updates?

;; Maybe a docstring will help?

(defn update-share-counts
  "Increment share-counts according to the share actions in updates:
    share-counts: map from user-id to map of share-type
    (must be one of :twitter, :facebook, or :email) to
    count of number of shares (a long)
    updates: sequence of maps {:user-id :share-type :delta}, where delta is
    amount to increment share-type by
  returns updated share-counts reflecting shares in updates"
  [share-counts updates]
  ;; etc

;; This is undoubtedly an improvement.  But:
;; - not machine readable
;;   - can't help us find bugs
;;   - prone to bit-rot
;; - not (very) human readable either...
;;   - many ways to phrase this content
;;   - hard to adopt uniform standards and get good at easily reading such
;;     descriptions
;;   - no abstraction -- if data types are reused, becomes very repetitive

;; Enter Schema

(def ShareType (s/enum :twitter :facebook :email))
(def ShareCounts {(s/named Long 'user-id) {ShareType (s/named Long 'count)}})

(s/defn update-share-counts :- ShareCounts
  [share-counts :- ShareCounts
   updates      :- [{:user-id Long :share-type ShareType :delta Long}]]
  ;; etc

;; By default, functionally identical to the normal defn ...
;; but with crisp documentation that is:
;; - easy to read (IMO)
;; - precise
;; - reusable
;; - composable

;; ...

;;; Moreover, Schemas are composable data, and print like their definitions:

;; ==> {java.lang.Long {(enum :facebook :email :twitter) java.lang.Long}}

;;; Schemas can be used for validation, and provide nice error messages

;; s/check returns nil for success, or something that looks like the
;; bad parts of your data

(s/check ShareCounts
         {12 {:twitter 10 :facebook 15}})
;; ==> nil

(s/check ShareCounts
         {12 {:twitter 10}
          13 {:twitter 1.4}
          "fred" {:facebook 10}})
;; ==> {13 {:twitter (not (instance? java.lang.Long 1.4))}
;;      (not (instance? java.lang.Long "fred")) invalid-key}

;; s/validate is like (assert (not (s/check ...)))

  (s/validate ShareCounts
              {12 {:twitter 10}
               13 {:twitter 1.4}
               "fred" {:facebook 10}})
  (catch Exception e e))
;; ==> #<RuntimeException: Value does not match schema:
;;      {(not (instance? java.lang.Long "fred")) invalid-key
;;       13 {:twitter (not (instance? java.lang.Long 1.4))}}>

;;; And if you attach schemas to your functions with s/defn or s/fn, you can
;; optionally turn on fn schema validation at runtime (e.g., in tests):

  (s/with-fn-validation (update-share-counts
                         {12 {:fakeblock 10}}
                         [{:user-id 10 :share-type :twitter :delta 12.0}]))
  (catch Exception e e))
;; ==> #<RuntimeException: Input to update-share-counts does not match schema:
;;     [(named {12 {(not (#{:facebook :email :twitter} :fakeblock)) invalid-key}} share-counts)
;;      (named [{:delta (not (instance? java.lang.Long 12.0))}] updates)]>

;;; Outline

(def ThisUnsession
  {0 (s/both Short (s/eq :tour))
   1 (s/enum :why? :how?)
   2 (s/pred next)
   3 (s/named s/Any 'discussion)
   double [(s/either (s/eq :comments) (s/eq :questions))]})

;;; Meet Schema

;;; Schema is a lightweight Clojure(Script) library for declarative data shape
;; description and validation.

;; We love the simplicity, flexibility, and pithiness of Clojure functions.

;; But as our codebase and team grew, we increasingly found the need for equally
;; crisp declarations of key data shapes, for readability and maintainability.

;;; Start simple: all type hints are valid Schemas:

(s/check String "good")
;; ==> nil

(s/check String :bad)
;; ==> (not (instance? java.lang.String :bad))

(s/check long 12)
;; ==> nil

;;; A few fancier 'leaf' schemas:

(s/check s/Any :whatever)
;; ==> nil

(s/check (s/enum :a :b :c) :a)
;; ==> nil

(s/check (s/enum :a :b :c) :Z)
;; ==> (not (#{:a :c :b} :Z))

(s/check (s/pred odd?) 3)
;; ==> nil

;;; Higher-order schemas

(s/check (s/maybe String) "asdf")
(s/check (s/maybe String) nil)
;; ==> nil

(s/check (s/either String long) "a")
(s/check (s/either String long) 1)
;; ==> nil

(s/check (s/both long (s/pred odd?)) 11)
;; ==> nil

(s/check (s/both long (s/pred odd?)) 12)
;; ==> (not (#<core$odd_QMARK_ clojure.core$odd_QMARK_@4c6ed9d1> 12))

(s/check (s/named long 'UserID) "bob")
;; ==> (named (not (instance? java.lang.Long "bob")) UserID)

;;;;; Data Structures

;;; Sequences

;; [schema] is a uniform sequence

(s/check [String] ["abc" "foo" "123"])
;; ==> nil

(s/check [String] ["abc" :foo "123"])
;; ==> [nil (not (instance? java.lang.String :foo)) nil]

;; positional constraints can be expressed with s/one (and s/optional):

(def ScoredLabel
  [(s/one String "label") (s/one Double "score")])

(s/check ScoredLabel ["a" 1.0])
;; ==> nil

(s/check ScoredLabel [:foo])
;; ==> [(named (not (instance? java.lang.String :foo)) "label")
;;      (not (present? "score"))]

;;; Maps

;; {key-schema val-schema} is a uniform map

(s/check {Long [String]} {1 ["a" "b"] 2 []})
;; ==> nil

;; specific key constraints can be expressed with s/required-key and
;; s/optional-key.

(def ScoredLabelMap
  {(s/required-key :label) String
   (s/optional-key :score) Double})

(s/check ScoredLabelMap {:label "a"})
(s/check ScoredLabelMap {:label "a" :score 1.0})
;; ==> nil

(s/check ScoredLabelMap {:label "a" :another-key :another-val})
;; ==> {:another-key disallowed-key}

;; the two can also be combined, and
;; s/required-key can be omitted for keywords

(s/check {:foo String Long Long} {:foo "a" 1 2 3 4})
;; ==> nil

;;; Complex data shapes can be built up from components, and checked
;; with helpful error messages

(def StampedNames
  {:date Long
   :names [String]})

(def ScoredLabel
  [(s/one String "label") (s/one Double "score")])

(defn OddNumber [number-type] ;; like generics, kinda
  (s/both number-type (s/pred odd?)))

(def FooBar
  {:stamped-strings (s/maybe StampedNames)
   (s/optional-key :scored-labels) [ScoredLabel]
   String (OddNumber Long)})

(s/check FooBar
         {:stamped-strings nil})
;; ==> nil

(s/check FooBar
         {:stamped-strings {:date 123 :names ["foo" "bar"]}
          :scored-labels [["label1" 1.0] ["label2" 2.0]]
          "another-key" 11})
;; ==> nil

(s/check FooBar
         {:stamped-strings {:date 123 :names ["foo" :bar]}
          :scored-labels [["label1" 1.0] ["label2"]]
          "another-key" 12})
;; ==> {:stamped-strings {:names [nil (not (instance? java.lang.String :bar))]}
;;      :scored-labels [nil [nil (not (present? "score"))]]
;;      "another-key" (not (odd? 12))}

;;; Schema also provides a natural way to attach and validate schemas on
;; defrecord fields:

(s/defrecord RStampedNames
    [date :- s/Int
     names :- [s/String]])

(s/check RStampedNames
         (->RStampedNames 10 ["a" :b]))
;; ==> {:names [nil (not (instance? java.lang.String :b))]}

;;; As well as a way to provide schemas for function inputs and outputs

(s/defn stamped-names :- RStampedNames
  [names :- [s/String]]
  (->RStampedNames (System/currentTimeMillis) names))

(s/explain (s/fn-schema stamped-names))
;; ==> (=> (record strangeloop_schema_unsession.RStampedNames
;;                 {:date Int, :names [java.lang.String]})
;;         [java.lang.String])

(stamped-names ["a" :b])
;; ==> {:date 1379537133893, :names ["a" :b]}

    (stamped-names ["a" :b]))
  ;; java.lang.RuntimeException: Input to stamped-names does not match schema:
  ;;  [(named [nil (not (instance? java.lang.String :b))] names)]

;;; A powerful subset of Schemas (json+) can be shared between Clojure
;; and ClojureScript
;; - enables schema sharing of API inputs and outputs
;; - less code duplication between client and server, better team communication

(def StampedNames
  {:date s/Int
   :names [s/String]})

;;; Design Goals

;;; #1: Schemas should be simple data.

;; Schemas are composable

;; Schemas are inspectible and transformable
;; - and so are validation errors

;; Validation is based on a simple, open protocol, so you can make your
;; own schema types and combinations

(extend-protocol s/Schema
  ;; #+cljs cljs.core.PersistentHashSet #+clj ;; if this were a cljx file...
  (check [this x]
    (or (when-not (set? x)
          (schema.macros/validation-error this x (list 'set? (schema.utils/value-name x))))
        (when-let [out (seq (keep #(s/check (first this) %) x))]
          (set out))))
  (explain [this] (set [(s/explain (first this))])))

;; Only ~600 LOC for Clojure(Script) schemas
;; (plus a few hundred more for s/defrecord and s/defn)

;;; #2: Compared to a docstring or comment with the same information,
;; a schema should always be less hassle to write, and easier to read.

;; Schemas should gracefully extend built-in type hints
;; - provided next to arguments and return types, on defrecords and defn
;; - type hints are valid schemas, and simple schemas act as type hints
;; - more complex schemas can be defined inline

;; Schemas should be optional, and never constrain what you can write

;; Schemas should be able to express arbitrary constraints

;; Validation should be off by default, so you don't have to think twice
;; about performance

;;; #3: Schemas should be incrementally useful.

;; First, define schemas for key data types in your namespace
;; - documentation
;; - manual data validation entering/exiting a system, with nice error msgs

;; Next, annotate key public functions in your namespace with schemas
;; - better documentation
;; - turn on validation at test-time to catch 'type-like' bugs
;;   (or run-time, if you like)

;; You could go on to annotate everything...
;; - but if you want a full-blown type system, you want core.typed
;; - core.typed can do static validation based on consistency of annotations
;;   throughout your program, with no runtime cost
;; - Schema is designed to make the most of a few annotations in critical places
;; - But there may be interesting synergies...

;;; What's next?

;;; Generate core.typed annotations on functions?

;;; Generate test data from schemas?

;;; Generate model classes for clients and even full client APIs (WIP!)

(def Interest {:type (s/enum "topic" "publisher" "user") :key s/String})

(require '[plumbing.core :as plumbing])

(plumbing/defnk interests$update
  {:methods      [:post]
   :query-params {}
   :body         {:updates [{:op (s/enum "add" "remove")
                             :interest Interest}]}
   :description  "Update the interests of the current user"
   :returns      {:interests [Interest]}}
  [env user-store [:request user [:body updates]]]
  ;; ...

;; On the server, we can automatically transform namespaces of such functions
;; into a versioned web API, with automatic schema validation of inputs and
;; outputs (and automatic plumbing of resources, using Graph)

;; In ClojureScript, we can directly use these same schemas to test our code and
;; validate our requests and responses.

;; On other clients, when we're forced to leave Clojure behind, we can
;; automatically generate model classes that know how to populate themselves
;; from server responses

;;  (Interest.h, Interest.m, InterestUpdateResponse.m, ...)

;; and even a full API class that handles the plumbing to the server.

;;; Your questions and feedback

;; TODO(audience): fill this in
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.