mini-webpack #1

mekefly · 2022-08-22T07:57:14Z

mini-webpack

这是一个迷你的打包工具，是为了用少量的代码理解打包等工具的核心原理

想直接看源码可以看这个路径
mini-webpack: https://github.com/mekefly/mini-wabpack

webpack 是什么？

本质上，webpack 是一个用于现代 JavaScript 应用程序的静态模块打包工具。当 webpack 处理应用程序时，它会在内部从一个或多个入口点构建一个依赖图(dependency graph)，然后将你项目中所需的每一个模块组合成一个或多个 bundles，它们均为静态资源，用于展示你的内容。

为什么要做这个呢?

本质上，这样的事只是为了更加理解 webpack 的工作方式，更加理解 node 的工作原理，为以后的工作扑平道路

我们要做到的几个任物点

制作依赖图
- 提取依赖
  - 获取文件内容
  - 生成抽象语法树
  - 通过抽象语法树来生成依赖信息
  - 对代码进行 import 转 require
- 对依赖进行平铺
合成结果
- 模板
  - 模板的包装方案
  - require 的手动实现
写入文件

那么这就开始吧

我们将会使用小步骤的开发思想完成这个项目

我们如果想要完成这个项目需要从小到大的解决若干问题

获取文件内容

这个就不多说了我们只需要获取 fs 包然后再请求对应的文件内容就可以了

import { readFileSync } from "fs";

export function getFileContent(filePath: string) {
  return readFileSync(filePath, { encoding: "utf-8" });
}

要注意的是，我们需要传入 encoding来设置转换的文件格式，防止乱码

生成抽象语法树

抽象语法树是一个非常庞大的概念，将文本解析成机器可阅读的一种操作，本项目暂时不手写它了，可能会在其他项目手写 mini 语法树欢迎继续关注

获取抽象语法树时有别人已经写好的项目@babel/parser,我们可以直接使用这个项目来完成语法树的生成

pnpm install @babel/parser -D

我们可以使用它导出的 parse 方法来导出语法树

例如

import { parse } from "@babel/parser";
parse("console.log('hello')");

但是，当我们使用 import {foo} from 'foo' 类似的语法时会报一个错误

SyntaxError: 'import' and 'export' may appear only with 'sourceType: "module"' (1:0)

...
...

错误写的很明白，只有 sourceType 为 module 时才能使用 import 我们给它加上就好了

import { parse } from "@babel/parser";

export function genAst(text: string) {
  return parse(text, { sourceType: "module" });
}

通过抽象语法树来生成依赖信息

获取依赖信息可以通过遍历语法树上的节点来找到对应的 import 节点

如果要遍历 ast 语法树我们可以使用 babel 为我们提供的工具 @babel/traverse

我们可以下载它

pnpm install @babel/traverse -D

我们可以使用一些小工具来查看语法树 https://astexplorer.net/

我们看到上面有个 ImportDeclaration 对的这个节点就是我们需要的包含 import 的语法树节点

我们可以通过下面的方式来获取这个节点

如果需要获取其他的节点也是一样的道理

import traverse from "@babel/traverse";
traverse(ast, {
  ImportDeclaration(node) {
    console.log(node);
  },
});

我们可以把代码完善一下

import traverse from "@babel/traverse";
import { genAst } from "./genAst";
export function dependencyAnalysis(ast: ReturnType<typeof genAst>) {
  //用与存储依赖的依赖图
  const dep: string[] = [];
  traverse(ast, {
    ImportDeclaration(node) {
      //获取到依赖的路径，可能是相对或绝对，至于如何处理就是 `path` 工具的事了
      const depPath = node.node.source.value;
      dep.push(depPath);
    },
  });
  //最终返回了依赖的列表
  return dep;
}

对代码进行 import 转 require

想要将 import 转为 require 可以使用 babel-core 来完成

pnpm install babel-core -D

使用也非常简单，我们直接使用它提供的 transform 函数来完成转换

function importToRequire(file: string) {
  return transform(file, { presets: ["env"] }).code ?? "";
}

提取依赖信息

我们将上面提取到的有用信息进行导出合并就可以里，为下一步做准备

export let id = 0;
export type Dependencies = ReturnType<typeof getDependencies>;
export function getDependencies(fileFullPath: string, path: string) {
  const file = getFileContent(fileFullPath);
  const ast = genAst(file);
  const dep = dependencyAnalysis(ast);
  const code = importToRequire(file);

  return {
    //为什么需要id? 这样来标识模块的唯一性，fullPath也可以，但不你要想到代码可能是运行在别人电脑上的，路径中有很多隐私信息会暴露，并且全路径是不需要的
    id: id++,
    dep,
    fullPath: fileFullPath,
    path,
    code,
    //mapping是局部模块名id映射例如 {"./foo.js":1}
    //当这个代码内执行require时将会从mapping中找到需要请求的模块的唯一id，然后根据id找到模块
    mapping: {} as any,
  };
}

为什么需要mapping见下面

那么如何找到目标函数呢?

制作依赖图

要制作依赖图我们需要解决以下几个问题

遇到递归依赖该怎么办？

//foo.js
const bar = require("bar.js")
//bar.js
const foo = require("foo.js")

这种情况要怎么阻止呢

我们可以提供一个已完成加载的列表
const set = []
set.add("foo.js")
set.add("bar.js")

如果在寻找依赖时发现 bar.js 已经做过处理了，这时候我们直接停止不在处理当前依赖，转而处理下一个依赖

如何将依赖摊平？

这里我使用的是广度优先搜索

例如

i => [a,b]
a => [c,d]
b => [e,f]

那么我们便历 i 添加到图表中这时

graph:[a,b]

然后便历数组继续平铺

i： 0
索引为 0 然后检查 a 的依赖便利添加到图表中
[a,b,c,d]

i：1
索引为 1 然后检查 b 的依赖便利添加到图表中
[a,b,c,d,e,f]

....

这样运行下去，所有依赖都将平铺

解决完这些问题后，
这时候我们就得到了下面代码

import { resolve, dirname } from "path";
import { getDependencies, id } from "./getDependencies";
import { mainPath } from "./index";

//对模块id做一个提前记录，当循环依赖时可能无法获取到id，提前记录以代需要时拿到，然后再对相对路径与模块id做一个映射
const modulesMapping: any = {};
export function genGraph() {
  const fullPath = resolve(mainPath);

  //已完成平铺的列表
  const dependencies = getDependencies(fullPath, mainPath);
  modulesMapping[fullPath] = 0;

  //已完成平铺的路径
  const completedPath: Set<string> = new Set();

  //添加到已完成平铺的列表中
  completedPath.add(fullPath);
  const graph: Array<ReturnType<typeof getDependencies>> = [dependencies];

  //便历图表
  for (const { dep, fullPath: filePath, mapping } of graph) {
    //对图表中的依赖进行平铺
    flattening(dep, filePath, mapping, completedPath, graph);
  }
  return graph;
}
function flattening(
  dep: string[],
  filePath: string,
  mapping: any,
  completedPath: Set<string>,
  graph: {
    id: number;
    dep: string[];
    fullPath: string;
    path: string;
    code: string;
    mapping: any;
  }[]
) {
  dep.forEach((path) => {
    //全路径
    const fullPath = resolve(dirname(filePath), path);

    //相对路径与模块id映射
    mappingId(fullPath, mapping, path);

    //如果平铺完成就停止运行
    if (loaded(completedPath, fullPath)) {
      return;
    }

    //获取依赖
    const dependencies = getDependencies(fullPath, path);
    //平铺到图表中
    graph.push(dependencies);

    //对已完成依赖解析的文件进行记录
    completedPath.add(filePath);
  });
}
function loaded(completedPath: Set<unknown>, fullPath: string) {
  return completedPath.has(fullPath);
}

//将依赖的相对路径与模块id完成映射
function mappingId(fullPath: string, mapping: any, path: string) {
  if (typeof modulesMapping[fullPath] === "undefined") {
    mapping[path] = modulesMapping[fullPath] = id;
  } else {
    mapping[path] = modulesMapping[fullPath];
  }
}

模板的包装方案

我想到的方案是 require 的方案
例如

// index.js
const {foo} = require("./foo.js")

// foo.js
exports.foo = function () {
  console.log("This is Foo");
}

我们可以将他们直接合并吗？

当然不可以，代码中是会有可能有重复的变量的

那应该如何合并？

我们要做的是让下面代码正常运行在同一个文件里

但是如果拥有同样的变量，它们将相互影响

所以我们需要使用函数包裹起来

function () {
  const {foo} = require("./foo.js")
}
function () {
  exports.foo = function () {
    console.log("This is Foo");
  }
}

然后为它提供一些必要的参数

function (require,module,exports) {
  const {foo} = require("./foo.js")
}
function (require,module,exports) {
  exports.foo = function () {
    console.log("This is Foo");
  }
}

这样就可以同时支持像下面这种写法了，这个就是 cjs 模块规范

function (require,module,exports) {
  function foo() {
    console.log("This is Foo");
  }
  module.exports = {
    foo
  }
}

那么如何调用呢?

那就要看下面了

require 的手动实现

我们先回想一下require的作用

参数传入路径
结果目标路径模块运行产生的结果( exports )

function require(path) {
  //找到目标函数
  //生成 module
  const module = {exports:{}}
  //执行
  target(require,module,module.exports)

  //返回exports
  return module.exports
}

那么如何找到目标函数呢?

我们可以用 mapping

我们设想一下一个 map 的 key 是路径，然后值是对应的函数，不就可以通过 path 获取到模块并执行了吗

const modules = {
  "foo.js": function (require,module,exports) {
    exports.foo = function () {
      console.log("This is Foo");
    }
  },
  "index.js": function (require,module,exports) {
    const {foo} = require("./foo.js")
  }
}

但是这样就完美了吗？

这样的情况下可能会依赖冲突

请注意，这里使用的都是相对路径

我们可以设想一下，如a目录有 index.js,foo.js; b 目录也有 index.js,foo.js,他们都是一样的 index.js 引用 foo.js

这时候跟目录的index.js 同时引用/a/index.js和/b/index.js

依赖扁平后就会同时有两个foo.js

/index.js, /a/index.js ,/b/index.js ,foo.js ,foo.js

这该怎么办呢？

我们就可以使用局部映射这就是上文提到模块 id 的作用了，唯一指定对应的模块

const modules = {
  0: [function (require,module,exports) {
    const {foo} = require("./foo.js")
  },
  {"./foo.js": 1}
  ],
  1: [function (require,module,exports) {
    exports.foo = function () {
      console.log("This is Foo");
    }
  },{}
  ],
}

当执行 require 时我们就去自己的映射表中找到平铺的模块 id，
然后通过唯一id去寻找对应的模块

还有一些情况，比如 require 被执行多次如何处理呢？

我们可以使用缓存把模块运行产生的结果给放到缓存数组里，如果再次执行，从数组里拿值

  var cache = {};

  function require(id){
    if(cache[id]) return cache[id].exports;
    ......
    const module = cache[id] = {......}
    ......
  }

这时候我们得到了完整代码

(function (modules) {
  var cache = {};
  require(0);
  function require(id) {
    //缓存
    if (cache[id]) return cache[id].exports;

    //找到目标函数

    var m = modules[id];
    var fn = m[0];
    var mapping = m[1];

    //本地 mapping映射
    function localRequire(path) {
      return require(mapping[path]);
    }

    //生成 module

    var module = (cache[id] = { exports: {} });

    //执行
    fn(localRequire, module, module.exports);

    //返回
    return module.exports;
  }
})({
  0: [
    function (require, module, exports) {
      const {foo} = require("./foo.js")
    },
    { "foo.js": 1 },
  ],
  1: [
    function (require, module, exports) {
      exports.foo = function () {
        console.log("This is Foo");
      }
    },
    {},
  ],
});

合成结果

通过模板生成代码

import { Dependencies } from "./getDependencies";

export function generateCode(graph: Dependencies[]) {
  return `
(function (modules) {
  var cache = {};
  require(0);
  function require(id) {
    if (cache[id]) {
      return cache[id].exports;
    }
    var m = modules[id];
    var fn = m[0];
    var mapping = m[1];
    function localRequire(path) {
      return require(mapping[path]);
    }
    var module = (cache[id] = { exports: {} });
    fn(localRequire, module, module.exports);
    return module.exports;
  }
})({
  ${graph
    .map((d) => {
      return `
    ${d.id}:[
      function (require, module, exports) {
        ${d.code}
      },
      ${JSON.stringify(d.mapping)}
    ]
    `;
    })
    .join()}
});
  `;
}

写入文件

import { genGraph } from "./genGraph";
import { generateCode } from "./template";
import { writeIn } from "./index";
import { writeFileSync } from "fs";
import { resolve } from "path";
import { builder } from "./builder";

const buildPath = "build/index.js";
export const mainPath = "./example/index.js";

export function builder() {
  const graph = genGraph();
  const code = generateCode(graph);
  writeIn(code);

}

export function writeIn(text: string) {
  writeFileSync(resolve(buildPath), text);
}

好了终于完成了，如果需要详细信息的话，以看原码仓库，仓库地址应该在文件的最上方，喜欢的话可以加个star一起讨论哦！

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mini-webpack #1

mini-webpack #1

mekefly commented Aug 22, 2022 •

edited

Loading

mini-webpack #1

mini-webpack #1

Comments

mekefly commented Aug 22, 2022 • edited Loading

mini-webpack

webpack 是什么？

为什么要做这个呢?

我们要做到的几个任物点

那么这就开始吧

获取文件内容

生成抽象语法树

通过抽象语法树来生成依赖信息

对代码进行 import 转 require

提取依赖信息

为什么需要mapping见下面

制作依赖图

模板的包装方案

require 的手动实现

那么如何找到目标函数呢?

合成结果

写入文件

mekefly commented Aug 22, 2022 •

edited

Loading